Surveillance is high on the agenda for most financial services’ compliance teams. Despite extensive work done in this space, many firms still find their surveillance processes insufficient compared to the expectations of regulators. Earlier this year, the UK’s Financial Conduct Authority (FCA) issued a statement that it remains concerned that requirements for market abuse surveillance are not being fully met, despite the Market Abuse Regulation (MAR) having been introduced as far back as 2016.
By GreySpark’s Jennie Brotherston, Senior Specialist, and Rachel Lindstrom, Senior Manager


The reason that market participants are not fully meeting all requirements may not be for want of trying, however. Surveillance – and communications surveillance, in particular – is a complex and data-heavy activity. Consequently, many compliance teams are now looking at how artificial intelligence (AI) – or one or more of its subcategories of advanced analytics – could be applied to the puzzle, either alongside or in place of more traditional solutions.
Indeed, UK, US and Canadian regulators, amongst others, are themselves focused on investing in advanced technologies, such as AI, to detect – and prosecute – firms participating in market manipulative behaviours. For instance, the Monetary Authority of Singapore (MAS) is using predictive modelling to identify individual financial advisors that have a higher likelihood of being involved in market misconduct. Mirroring regulators, compliance professionals across the globe are reviewing their legacy surveillance systems and looking to implement ‘next generation’ surveillance solutions. A 2021 survey conducted by GreySpark Partners of key compliance personnel in 18 capital markets firms operating across EMEA, the US and APAC indicated that as many as 70% were looking to future-proof their surveillance solutions with advanced analytics such as AI.
The majority of ‘next generation’ surveillance solutions available today use AI techniques to process, manage and analyse the large, complex and diverse volumes of data that compliance teams need to monitor. As firms begin to embrace AI in pockets across their organisations, many people may still have questions about what the application of AI technology means in practice. This report by GreySpark Partners, in association with Relativity, presents an explanation of artificial intelligence and describes why it is a particularly useful set of technologies to apply to the communications surveillance use case.
The Regulatory Perspective
In 2020, the Bank of England (BoE) and the FCA collaborated to form the AI Public-Private Forum (AIPPF) to better understand the use and impact of AI in UK financial markets, and to explore how technological change can support innovation that benefits both consumers and markets. After looking at the risks associated with AI, the forum found that the majority of risks arising from the use of AI models in financial services are, in fact, not specific to AI but are also associated with use of non-AI models. However, the scale, speed and complexity of the issues that could arise when AI technologies are employed are generally higher than for more traditional methods. Users of AI in the financial services typically face a number of challenges – particularly, how they can enhance the explainability of the models and clearly communicate them to users, compliance staff, change management teams and regulators. While this is not an AI-specific challenge, it can be more complex than it is for traditional systems.
Regulation of AI technologies and their use is developing across the world. In the US, the Algorithmic Accountability Act 2022 imposes reporting and disclosure requirements on companies using automated decisioning systems and requires all firms that build or use such systems to make careful impact assessments. The full ramifications of the Act have yet to be determined, but the financial sector will certainly be affected alongside other industries. In addition, the Federal Trade Commission (FTC) is considering enacting new regulations to ban certain AI practices that it deems carry too much risk to consumers. For example, guidance has recently been issued stipulating that lenders cannot use ‘biased or unexplainable’ algorithms for consumer credit. Across the pond, the European Parliament is currently considering the AI Act, which aims to enforce a risk-based approach to AI. A strict regime of mandatory requirements, it will require monitoring and human oversight of activities classified as ‘high-risk’. Meanwhile, the utilisation of AI in a variety of use cases – and particularly in surveillance – continues to accelerate across the industry.

The Why and How of Machine Learning Use
Machine learning (ML) – a subset of AI – is the application of algorithms that imitate the way that humans learn. The underlying reasons for the appropriateness of ML as a tool can be divided into three main categories:
- Multiple / Interdependent Factors – When rules that depend on many factors overlap or need to be tuned finely, it can be difficult to code them effectively. ML can help solve this problem, being more efficient than a human could at dealing with all possible scenarios.
- High Volumes of Data – There is a limit to the number of documents that a human can read and decide whether they are relevant for a closer review. As the number of documents increases, a comprehensive review of all documents by human resources can become unfeasible. ML solutions, however, are effective at processing and reviewing large numbers of documents.
- Patterns and Trends – When there are trends and patterns in data, ML can often identify them, even where a human might struggle.
As shown in Figure 1, ML can be applied to data in the following ways:
- Supervised learning takes labelled data and maps it to pre-defined outputs, using classification (for discrete data), regression (for continuous data) or neural network techniques.
- Unsupervised learning takes unlabelled data and maps it using techniques such as clustering and neural networks to outputs derived during the process.
- Reinforcement learning is a specific subset of unsupervised learning in which a system is able to learn using reward-based feedback from its own experience.
The techniques described in Figure 1 can be applied across a variety of use cases, including, for example:
- Fraud Detection – identifying suspicious patterns, behaviours and documents across identification data.
- Diagnostics – modelling a set of ‘symptoms’ to enable a swift and accurate diagnosis.
- Customer Retention – where ‘churn prediction’ models can predict whether or not a given customer is likely to cease to continue as a customer.
- Forecasting – a powerful ML use case where future supply and demand can be predicted using current and historic data analysis.
- Image Classification – neural networks can be trained to recognise images and classify them accordingly. For example, sorting images containing cats from images not containing cats.
- Meaningful Compression of Data – can be beneficial in many cases, and ML can be used to remove and replace repetitive data effectively.
- Structure Discovery – rapidly understanding and parsing unstructured data using unsupervised ML.
- Big Data Visualisation – translating information into a visual context to help humans identify patterns, trends and outliers in enormous volumes of data.
- Feature Elicitation – takes an initial dataset and generates useful new derived values or ‘features’. This is quite an important component of optical character recognition (OCR).
- Recommender Systems – a type of ML model used to apply ratings or ranking to sets of data.
- Targeted Marketing – an extremely common use of ML, in which customer behaviours are analysed and marketing targeted accordingly.
- Customer Segmentation – the process of dividing customers into groups based on common characteristics.
Figure 1: The Machine Learning Sub-Categorisations and Example Techniques

Source: GreySpark analysis
Utilising ML in Various Scenarios
ML is not always the most appropriate way to analyse data and it is important to recognise that this may be the case in some scenarios. While ML – specifically deep learning algorithms – can be useful for finding complex relationships and hidden patterns in data with many interdependent variables, it may be possible for a simpler rule-based system to perform equally well for less complicated problems; in which case it is advisable to stick with that more straightforward solution, as per Occam’s razor.
Figure 2 shows some scenarios where ML can be an appropriate technique to use to analyse data, as well as another category of AI – natural language processing (NLP) – which is explored in the next chapter.
Figure 2: Appropriate Techniques for Various Generic Use Cases

Source: GreySpark analysis

When and How Natural Language Processing Is Used
Alongside pixel and numerical data, ML can also be applied to textual data in the form of natural language processing (NLP), which enables its use in a number of scenarios, such as sentiment analysis, in which a computer can ‘understand’ the sentiment of a given text. NLP can be described as the field of AI which allows computers to understand the way humans speak and write and, when enhanced by ML, has the potential to become the perfect tool for communications surveillance.
Sentiment Analysis
Sentiment analysis – a subcategory of NLP – aims to provide an enhanced perspective of a communication to discern the intent of the language used. This can be crucial when trying to detect the attitudes and opinions of subjects, whether that be in customer feedback, market abuse detection or other goals. The ambiguity of language can make it challenging to discern which of the multiple possible meanings of a word or sentence is the correct one, and this can lead to false positive alerts or missed problems. Some examples of the challenge that natural language poses to machines are shown in Figure 3.
In the market abuse detection case, when analysing communications, it is often far more useful to ascertain the meaning of sentences than it is simply to identify words – or chains of words – within them. Being able to detect unusually emotionally charged conversation between individuals can be a useful way to monitor risk.
To achieve maximally effective sentiment analysis, NLP is often deployed in conjunction with other natural language features such as entity extraction (the identification of specific data within unstructured text), text classification (sorting open-ended text into predefined categories) and keyword detection.
Figure 3: The Key Challenges Associated with Sentiment Analysis

Source: GreySpark analysis
Common NLP Use Cases
NLP has a large number of potential uses:
- Spam Filtering – NLP is used to filter email addresses or content that appear to be suspicious (i.e., those that do not fit ‘normal’ communications or align with pre-tagged spam structure) or may be trying to impersonate legitimate messages, for example, “g00gle.com” to mimic “google.com”.
- Translation – NLP can be used to determine not just words and sentence structure, but the meaning of sentences to coherently translate text into another language.
- Chatbots – NLP is required for chatbots to respond to unstructured text inputs, whereby queries are categorised into predetermined buckets to provide relevant information to the customer.
- Text Summarisation – NLP can be used to automate the reduction of long-form text to concise summaries. NLP can determine the key points in a text to summarise and convey the necessary information in a concise manner

The Case for Using NLP & ML in Communications Surveillance
There are many instances where ML and NLP could be used – and are already being used – in financial services firms to replace or improve on more traditional systems and processes, including fraud detection, customer services and marketing, credit risk management, trade pricing and execution and more. There is notable appetite across the industry for automation of mundane or repetitive tasks, and the use of ML techniques in relatively low risk areas such as post-trade activities is often a safe starting point for firms in their advanced analytics journey. ML and NLP can also be used in pre-trade processes. For example, a subset of NLP, known as Optical Character Recognition (OCR), can be employed in ‘Know Your Customer’ screening to review the high volumes of documentation. ML-based pattern identification, recognition and automation can also be applied in the ‘riskier’ area of algorithmic trading, where speed and data volumes are important, although incorrect decisions can have deleterious financial consequences, so there is still significant caution exercised in this application of ML in many institutions. Overall, ML and NPL tools and techniques can be utilised to great affect across financial services firms.
Communications Surveillance
Amongst the most suitable use cases for ML and NLP are surveillance use cases – and communications surveillance in particular – where it may be employed to detect a wide spectrum of potential risks. Regulation of many financial services firms necessitates monitoring for market abuse (including collusion, insider trading, market manipulation and bribery and corruption) and financial services firms are required to have robust codes of conduct in place and monitoring to detect any breaches. Over the years that GreySpark has been studying communications surveillance systems in the capital markets, a list of best practices has coalesced (see Figure 4), but many firms are yet to achieve the full complement of these best practices. GreySpark believes that there is potential for these firms to improve their surveillance outcomes and reduce the ongoing monitoring efforts by employing advanced analytics. Figure 4 shows where ML and / or NLP can be applied to achieve each best practice.
The use of NLP is essential for effective automated communications surveillance. Without NLP, surveillance teams can only survey communications in an extremely basic and inefficient way. If NLP is employed, surveillance teams can search and analyse voice recordings with minimal human intervention, significantly reducing costs and increasing efficacy, and accurately analyse and search a far greater volume of data in a much shorter timeframe compared to traditional database systems.
Using NLP technology, surveillance teams are able to automatically transcribe voice recordings with a high degree of accuracy to filter out irrelevant text and to perform enhanced lexicographical searching over and above what can be done with simple lexicons by recognising abbreviations, dialect, misspellings, grammatical errors, common slang, jargon and parlance.
NLP also offers the potential to perform automated sentiment analysis on communications data, further reducing the need for human intervention and to recognise and languages other than the core language of the business automatically.
It can also be used to enhance the experience of users with features such as recognition of the primary language of a communication, allowing it to be directed to the correct reviewer, and it can enhance a user’s understanding of the data by creating visual representations of the communications dataset.
For example, ‘concept clustering’ whereby the user can quickly and easily see whether conversations between certain people or groups of people have changed in tone or content with an impact on the associated risk of the communication. In effect, NLP can be used to create an automated understanding of communications.
Figure 4: Best Practices in Communications Surveillance & Where NLP and ML Can Be Used Effectively
Source: GreySpark analysis
Analysing Surveillance Data
In addition to the language issues, there are other difficulties inherent in communications surveillance that traditional technologies cannot necessarily address well, and which ML techniques could help to solve. The increasing volume of electronic communications is an ongoing issue for the majority of firms today. For comprehensive, non-sample-based surveillance, even with a very well-defined lexicon-based search, it is likely that a higher volume of alerts will be generated than can be reviewed in a timely manner by a reasonable number of human people. There are several ways that ML techniques can be used to automate and enhance this process.
ML can be used to reduce the volume of raw data through the identification and removal of irrelevant content. For example, spam filtering and email threading (see Section 4.3) can be extended to filter content that is not of concern to compliance teams and conversation threading across media other than email. Irrelevant files can be eliminated from the raw data set before they are included in subsequent analytical processes using a variety of preliminary filters. For example, files that can be removed include certain file types, file locations, file sizes, NIST files, date ranges and sender domains (see Figure 5). This kind of up-front reduction of data can go a long way toward reducing the volume of content that requires human review or intervention.
Figure 5: Removal of Irrelevant Content

Source: GreySpark analysis
Spam filtering has quite a specific use in communications surveillance, where ‘spam’ has a slightly different and perhaps wider meaning than it does in everyday communication, describing low-risk content that does not need to be monitored or reviewed, including non-authored content within items (say email headers or signatures), which ML models can be very effective at detecting.
Email Threading
Email threading is a useful technique to reduce the volume of data needing human review. It reduces duplicative effort while allowing reviewers to see the whole context of a conversation. By collapsing conversations down from many items or emails into one single thread, ML can be used to segment, group and identify inclusive and non-inclusive content. For a document to be recognised as an email and threaded, it must include an ‘Email From’ field and at least one of the following: Sent Date; Email To; Email Subject; Email CC and Email BCC, as shown in Figure 6.
Figure 6: Fields Needed to Recognise an Email

Source: GreySpark analysis
Most people today have experienced the messy nature of email communication, even when utilising a ‘communication view’ of their inbox. For example, not all recipients ‘Reply to All’, and some will reply to earlier emails in the email chain, which makes keeping track of the conversation challenging. Few and far between are companies that employ rigid protocols for email thread conduct. Analysis of such a conversation may reveal one of a number of ways that a furtive communication may be carried out by manipulating the email thread itself but is complicated to achieve using traditional techniques. Utilising a platform that incorporates ML and NLP, analysts can review metrics for email threading, such as those shown in Figure 7, which allow them to understand the email communication and draw insights from the data that would be challenging to garner otherwise.
Figure 7: Example of Key Metrics from an Email Threading Analysis

Source: GreySpark analysis
In an email chain there are two main types of data (see Figure 8):
- Inclusive email – is one that contains unique content not included in any other email, and thus, must be reviewed. An email with no replies or forwards is by definition inclusive. The last email in a thread is also by definition inclusive.
- Non-inclusive email – is one where the text and attachments are fully contained in other (inclusive) emails.
Analysis of the wider data set, using similar ML techniques to extend threading across multiple media, can provide an extension of what you might normally see in your email inbox, by quickly recognising that a conversation between a given group of people may have moved from one medium to another.
ML can also be used to improve the risk rating of alerts, so that they can be prioritised in terms of which need swift human intervention, and which can be set aside until later, or identified as false positives and removed from the alert population. Active learning models can be trained to rate and prioritise alerts based on human feedback, and pre-trained models can be utilised for specific scenarios initially and reviewer feedback used to improve the risk rating quality quickly and effectively as well as to enhance the model with new scenarios as they arise.
This is a good example of a space where more traditional technology, such as the lexicographical word-based search, may have value, but is enhanced with the addition of AI techniques which can automate some of the less interesting decisions, leaving the expert humans to apply their ‘real intelligence’ to more complex cases.
Considering these examples, and likely more, it is possible to conclude that the use of artificial intelligence – of NLP and ML – in communications surveillance is not only a good idea, but that it may realistically be the only practical way to carry out effective and comprehensive surveillance and thus be regulatory compliant in today’s highly digitalised financial services industry. NLP is an essential part of the automation of communications analysis, and other ML techniques provide a practical solution to analysing data and managing volume without large-scale manual intervention or resorting to sample testing. Regulators are themselves already implementing similar techniques and it is always desirable for institutions to detect an issue before the regulator does. It is also very likely that, unless they have significant in-house expertise in the field, compliance professionals may want to consider seeking assistance from experienced AI solution providers in the field.
Figure 8: Identifying Inclusive and Non-inclusive Content in a Growing Email Chain

Source: GreySpark analysis
Efficient, Effective & Comprehensive Communications Surveillance
Surveillance is inherently data driven, and the volume, complexity and diversity of data can be extremely high – particularly for unstructured, non-numerical communications – and traditional database and analysis technologies can struggle to manage, query and analyse efficiently. However, not only do AI processes potentially enhance and improve surveillance processes, but they also represent a comparatively ‘safe space’ for institutions to introduce this kind of new technology. Unlike, say, algorithmic trading or settlements, where decisions made may generate significant financial and reputational risk to the organisation, the kind of decisions made in surveillance, whilst important, are likely to have much less direct and immediate impact.
“AI technology offers firms the ability to capture and surveil large amounts of structured and unstructured data in various forms (e.g., text, speech, voice, image, and video) from both internal and external sources in order to identify patterns and anomalies. This enables firms to holistically surveil and monitor various functions across the enterprise, as well as monitor conduct across various individuals (e.g., traders, registered representatives, employees, and customers), in a more efficient, effective, and risk-based manner.”
FINRA: AI Applications in the Securities Industry
Unlike other uses, where businesses and regulators may be wary of ‘black box’ decision making, the FCA are positively encouraging use of ML for surveillance and fraud detection purposes (where the inherent risk is lower than, say, AI-driven financial decisions) and, indeed, many regulators are themselves using similar techniques and technologies.
Relativity Trace aims to rid the world of corporate misconduct by proactively monitoring employee communications — audio and eComms — to quickly detect risky behaviours such as insider trading, collusion and other forms of market abuse before they escalate. The SaaS solution, securely stored in the cloud, completely automates the monitoring of email, chat and audio communications from more than 50 data sources. Trace scales as the business does and seamlessly handles exponential growth across communication volumes, communication channels, monitored individuals, and the global footprint of the business with increased throughput and availability. The solution comes with more than 40 pre-built surveillance policies in multiple languages, and it monitors some of the most common types of misconduct. This allows organisations to establish a predefined set of rules and manipulate them for specific considerations within their organisation to better identify the types of risky behaviour they are looking for while reducing false positive alerts by more than 90%.