Skip to main content
In recent years, there has been a global shift away from the established pattern of business communications. The Covid pandemic, and the continuing uncertainty it has created, has accelerated a change in working practices that was already underway; namely that of staff working from non-office locations. Surveillance teams have seen an increase in the volume of recorded audio communications and the wider use of a variety of platforms traditionally used for personal, rather than professional, communications. As a result, financial services firms are under increasing pressure not only to capture and quickly analyse increasingly large audio communications, but also to futureproof their surveillance platforms in the quickly evolving and diverse landscape of media channels.

By GreySpark’s Jennie Brotherston, Senior Specialist, and Rachel Lindstrom, Senior Manager

To comply with regulation, trading-related conversations in financial firms must take place over approved channels. When a communication is via an unapproved channel, data is not captured and stored, and this can be costly for the firm, when identified by the regulator. The use of WhatsApp, typically a channel for personal communications, is unsuitable for use in trading-related activities, because it includes functionality that allows messages to be deleted. In December 2021, JPMorgan Chase was fined USD 200 million by two US financial regulators for allowing employees at its Wall Street division to use WhatsApp and other platforms to evade being captured and recorded by compliance teams. However, as traditional face-to-face meetings transferred online in early 2020, communications platforms, such as Zoom and Microsoft Teams, saw huge increases in the uptake of their products. Zoom captured 3.3 trillion minutes of recorded meetings globally between Q3 2020 and Q3 2021 (up from 97 billion between Q3 2018 and Q3 2019), while Microsoft Teams recorded 250 million daily users in Q3 2021, which was up from 145 million in the previous quarter.

Despite this paradigm shift in working practices, financial services regulators still expect trading firms to monitor all communications of recordable individuals, no matter the communication channel used. So, surveillance teams, regardless of their stance on ‘compliance by policy’, will need to adjust if they are to keep pace with developments in communications technology; and there are innovative thinkers already making changes in financial firms already. Deutsche Bank, for example, has begun using WhatsApp within the secure messaging service, Symphony. Recorded staff can therefore benefit from the immediacy that WhatsApp permits, and compliance staff can capture and store everything that is spoken or written.

In this quickly evolving technology space, the question that must be addressed by compliance teams is whether the increased diversity of communications channels – both those approved for professional use and those typically used for personal communications – is something that their communication surveillance platform/s are adequate to manage. Recent advances in voice recognition and transcription technology, along with data structuring techniques and machine learning, offer the potential provision of context, review capabilities and accuracy in the analysis of these larger volumes of voice data.

Capturing Nuance in Audio Communications

The challenge for surveillance teams in financial firms is not just to keep abreast of the new breed of communication channels that are now part of our everyday communication, but also how to deal with the large volumes of audio data being generated. Historically, voice data has been extremely difficult to normalise, contextualise and analyse comprehensively. Even items as seemingly uncomplicated as short textual sentences cannot be easily stored or analysed using a traditional database, and files of audio and video data are far more complex. Challenges specifically affecting the analysis of audio data range from:

  • Poorly integrated audio and eComms surveillance platforms since most functionality typically resides in the eComms platform;
  • Lack of a standard for metadata across channels, so, a significant amount of normalisation is needed for diverse audio communications data to be ingested into a single surveillance platform; and
  • Large file sizes for audio communication, which can lead to difficulty exporting data from audio systems.

Older legacy communications surveillance systems have not evolved such that they can accurately monitor this type of data, which can lead to firms failing to capture risky activity. As a consequence, many financial firms do not perform regular comprehensive analysis of audio or audiovisual recordings at all. Instead, they store recordings and only conduct manual investigations in cases where such analysis is specifically required (such as for the purposes of trade reconstruction). Of the financial firms that do regularly analyse audio data, many rely on sampling where analysts listen to only a small percentage of randomly selected communications, which obviously does not adequately monitor all communications.

Voice recognition technology has seen extensive improvements over recent years, although the accuracy of transcriptions still only ranges between 60 to 80%, typically.

Consistency of the transcription accuracy across a single communication is not always absolute either. For instance, a transcript in its entirety could be 90% accurate, but could have an accuracy as low as 60% across partial sections of the transcript.

The accuracy of transcriptions can be critical for financial firms as they are analysed in an eComms surveillance platform, typically, where rule-based logic is applied to enable analysts search for key words and phrases. So, this approach can lead to flawed analysis. Additionally, primarily relying on rule-based logic can mean that analysts lose the context of a communication and fail to identify misconduct.

Understanding the context of a communication is  critical if surveillance teams are to identify any wrongdoing. Audio communications incorporate a wide range of additional nuances that are not present in written communications and, valuable as they are, these subtexts are extremely difficult to capture. A proper understanding of the communication can only be formed by taking these nuances into account. Advanced techniques can be used to allow analysts to view and search across the whole population of data, giving them a more holistic understanding of the content and context of the communications under review.

Next generation techniques used together in one platform can enable firms to analyse the large and growing volumes of audio data more efficiently, effectively and comprehensively (see Figure 1). Natural Language Processing (NLP) and machine learning (ML) used in conjunction with voice recognition techniques can provide transcriptions of voice recordings that deliver far greater accuracy than can be achieved using more traditional techniques.

Figure 1: Three-layered Next Generation Analysis of Audio Communications Data for Surveillance and Monitoring

(click image to enlarge)

Source: GreySpark analysis

Advanced Data Dissection Analytics

Alternative data management techniques and technologies need to be employed to unlock the deep complexity of an audio communication and to fully understand its content and context (see Figure 2). As conversations can take place across multiple communication channels, the surveillance platform must normalise the metadata of each audio file ingested (‘A’ in Figure 2). A significant amount of information can be gleaned by isolating and reviewing the normalised metadata. For instance, it is relatively straightforward to isolate timestamps and identify the systems used, and even specific words and numbers and conversational themes can be collected. A good surveillance platform also allows the structured metadata to be stored alongside the unstructured audio file and its transcript to aid the human decision-making required at the end of the evaluation process.

Figure 2: Key Considerations for the Analysis of Audio Communications

(click image to enlarge)

Source: GreySpark analysis

Natural Language Processing (NLP) is an extremely powerful tool that, when used in conjunction with voice recognition techniques, can generate far more accurate transcriptions of audio and the audio component of video recordings than can be created using voice recognition techniques alone (‘B’ in Figure 2). NLP facilitates the identification of different languages, dialects, mixed languages and jargon so that voice and textual data can be translated into a single ‘base’ language. Rule-based logic can then be used to facilitate high-level searches and other basic analyses (‘C’ in Figure 2).

NLP is not simply useful for the transcription of audio files, however; it can also parameterise transcripts in order to contextualise the communications (‘D’ in Figure 2). A transcript is divided into subsets, which are then grouped according to pre-defined parameters. From this, communications can be sorted into segments according to specific themes – for instance, by media, person, location, language or intention. The more times the data is sorted and segmented the better the analysts understanding of the dataset as a whole and the more efficient is the analysis of the resulting structures.

By attaching a risk score to every item of data or unit of communication, an overall score can be applied to datasets, and communications that require further analysis can be automatically flagged for intervention by the compliance team. The utilisation of machine learning (ML) techniques in the segmentation and scoring processes can reduce noise and increase accuracy, so that analysts can spend more time reviewing items that are truly of interest.

From Data to Information to Insight Despite continuing uncertainty about how long the Covid pandemic will last and what the residual effects will be to the global business environment, regulators continue to emphasize the importance of market abuse surveillance. In the UK, the FCA has reminded firms of their mandated responsibility to record and monitor all new communication channels through which trading-related conversations occur.

The LIBOR and FX scandals that made unwelcome headline news for firms prior to the pandemic are still fresh enough to serve as a warning to firms that they must proactively monitor employee conduct, where appropriate, to ensure that individuals comply with regulation and company policy.

The effects of the pandemic on organisations have accelerated changes that were already in progress. More than ever, financial firms need to put in place scalable surveillance solutions that can capture, record and analyse a wide variety of audio and textual media used from any location. In 2022, it is important that surveillance teams learn lessons from the pandemic; that legacy technology is not futureproof and implementing a scalable, flexible, next generation platform that takes into account the intricacies of audio data is critical.

Relativity Trace aims to rid the world of corporate misconduct by proactively monitoring employee communications — audio and eComms — to quickly detect risky behaviours such as insider trading, collusion and other forms of market abuse before they escalate. The SaaS solution, securely stored in the cloud, completely automates the monitoring of email, chat and audio communications from more than 50 data sources. Trace scales as the business does and seamlessly handles exponential growth across communication volumes, communication channels, monitored individuals, and the global footprint of the business with increased throughput and availability. The solution comes with more than 40 pre-built surveillance policies in multiple languages, and it monitors some of the most common types of misconduct. This allows organisations to establish a predefined set of rules and manipulate them for specific considerations within their organisation to better identify the types of risky behaviour they are looking for while reducing false positive alerts by more than 90%.

Jennie Brotherston is a Senior Specialist at GreySpark Partners and an experienced CFA and PRINCE2 qualified professional. At GreySpark, she has engaged with clients as a Business Analyst, Project Manager and SME to help deliver projects across a variety of disciplines. She thrives on the technical challenges that are part and parcel of a life in financial services and enjoys delving into the detail to help clients get their change management projects over the line.