As financial institutions move to cloud, they need more intelligent monitoring to manage a complex IT topology.
- The use of cloud technology by financial institutions is well established, but, while cloud technology has been carefully rolled out by them, typically this has been done in a piecemeal rather than enterprise-level approach.
- In most cases the enterprise IT topology is increasingly complex, incorporating on-premises monolithic and container-based applications, virtual machines and cloud-based computing in various forms. The lack of a single comprehensive view of all these deployments raises the possibility of security breaches, compliance violations, productivity loss, inefficiencies, slowed enterprise enhancement and client churn.
- As cloud computing will inevitably be more widely and comprehensively used in the future, financial institutions need to think about using more intelligent monitoring for visibility across the complexity of their entire enterprise IT topology.
By GreySpark’s Rachel Lindstrom, Senior Manager
Cloud strategies in banks are often owned by Information Security (InfoSec) teams and, therefore, address mainly InfoSec issues, rather than business strategy. The question hovering over this is why haven’t business strategists seized the opportunity to implement business-driven cloud strategies? The answer is, well, cloudy. Sitting between the business and InfoSec teams are platform and application teams with a good understanding of the complexities of a strategic large-scale deployment to the cloud. They have no uniform voice on whether large-scale cloud deployments would be a good thing in general or not.
Putting future strategies aside for a moment, and focusing on the reality of enterprises today, most developers agree that large monolithic applications should be replaced (at some point). A significant majority of banks have addressed some of the low-hanging fruit and have taken mostly less latency-sensitive, non-critical services into the cloud. Indeed, banks have already implemented a host of IT ‘as-a-Service’ offerings in the cloud. Instead of making the enterprise IT cleaner and more streamlined, it has typically become more complex. This complexity means that many financial firms are failing to obtain a clear and comprehensive view of their entire IT enterprise.
In this article, GreySpark Partners, in association with ITRS, explores the manifestations of cloud and the challenges of managing and monitoring today’s complex IT environments in financial institutions.
Complex IT Enterprises
In terms of perception, cloud computing has suffered some quite damning service outages over the last few years. When outsourcing any service, financial firms still hold the ultimate responsibility for it; so, the need for greater visibility and confidence in cloud providers is essential. This can only be assured if they can predict, manage and communicate disruptions to their cloud services in close to real time. A cloud survey by GreySpark showed that the top four reasons for banks not to move significant parts of their enterprise into the cloud were:
• Lack of internal expertise
• Regulatory concerns
• Push back from the InfoSec team
• Incompatible organisational structure
The effective management of an enterprise as complex as those of today’s financial institutions requires the ability to monitor, notify, react and resolve issues quickly and with as little disruption to the running of the institution as possible. The reputational damage created by an interruption in a financial institution’s services means that they must ensure the availability and resilience of its services are a priority. Figure 1 shows the architectures for on-premises technology. Every component in the technology stack will be monitored and log data and metrics garnered on premises. The behaviour of the network is fully visible to the enterprise IT management.
It is important to remember that the analysis of log data and / or metrics is critical for every application run on-premises since it gives valuable information for troubleshooting, evaluating performance issues, and drawing an overall picture of the behaviour of the financial institution’s architecture. Monitoring of applications running on infrastructure on-premises is likely to be based on a pull methodology. Here, agents integrate with applications and pull useful metrics into monitoring dashboards.
As already mentioned, cloud deployments in financial institutions take a variety of forms. The form taken depends on how much of the stack is managed and monitored by the financial institution and how much is done by the cloud provider. The three common cloud deployment types are shown in Figure 2 alongside an on-premises stack. Cloud native applications tend to be more granular, distributed and dynamic in nature, which means that a push methodology is often used. Metrics and trace data are emitted by the applications when they become active.
Most cloud service providers, large and small, produce metrics that evidence the performance of the cloud in terms of network connectivity and availability, as well service availability. System, security and application event logs collected by the cloud provider may include a sequential record of all user activities. Financial institutions will need access to those records for incident management, forensic analysis and the monitoring of regulatory compliance. However, this is often easier said than done as the cloud-provided metrics are plentiful, so identifying which metrics enhance the observability for each financial firm’s unique enterprise can be quite a challenge.
Another layer of complexity for the enterprise IT manager is that each cloud services provider used in the financial firm – and there are likely to be several – will send data via its own API, as shown in the bottom left of Figure 2. Logging into multiple, different, potentially uncorrelated systems fed by these APIs makes it difficult to understand the performance of the enterprise and introduces inefficiencies.
Workflows in financial services are increasingly digitized and can involve multiple systems. As a system becomes legacy, decisions are made on its replacement and depending on the requirements and budget this could mean choosing a solution with a deployment model that differs from upstream or downstream workflow systems. Indeed, workflows can involve both on-premises and cloud-based systems at various points. An example of systems utilised to facilitate a trade and some considerations for their deployment approach are shown in Figure 3.
As Figure 3 indicates, financial firms are wary of sending sensitive data – either defined as such in regulation such as MiFID II or simply business-sensitive data – into cloud-based IT systems. This has led to the phenomenon of edge technology. In essence, databases reside on the edge of on-premises enterprises, so that data can be quickly retrieved for in-cloud calculations. Edge technology, too, adds a degree of additional complexity to the IP topology and therefore its observability.
The Importance of Monitoring
As cloud computing continues to be more widely used in the future, more intelligent monitoring will be needed to combat the following challenges:
1. Security Breaches
Perhaps the most significant implication of insufficient monitoring is that it is easier for hackers to breach and disrupt IT systems and steal data. One of the recommendations in a report by the International Monetary Fund (IMF) in March 2023 was for financial firms to strengthen cyber ‘hygiene’, secure their bespoke systems and enhance response and recovery strategies.
The IMF states that most successful attacks are the result of routine lapses—such as failing to deploy patch updates or make the correct security configurations. In this context, a crystal-clear view of minute-by-minute operations is required to ensure that there is no corner wherein hackers can take advantage of the dark to disrupt operations.
2. Compliance Violations
Failing to monitor IT systems can result in violations that can lead to costly fines and legal consequences. There are rules and guidance issued by supervisory bodies for financial firms relating to their operational resilience. In March 2022, the UK’s Financial Conduct Authority (FCA) brought in rules that financial institutions must, amongst a list of other operational resilience boosting measures, “carry out mapping and testing to a level of sophistication necessary to identify important business services, set impact tolerances and identify any vulnerabilities in your operational resilience”.
This means that the IT that supplies those important business services must be monitored on an ongoing basis to ensure that tolerances are not breached. The lack of a clear view of the IT enterprise may, therefore, incur unpleasant sanctions for financial institutions.
3. Productivity Loss
Without adequate monitoring in place, firms are unable to pinpoint precisely where in production an issue lies. This means valuable time – sometimes days – can be wasted trying to identify the source of the error, which increases the mean time to resolution (MTTR) – a key productivity metric. In addition, the interconnectedness of IT systems can mean that small failures in one system can be propagated to other systems until there is a critical failure. So, spotting an issue that may escalate in good time requires not only monitoring, but also the staff to comprehend the consequence of a failure before it becomes critical.
4. Poor Performance
IT systems that are not monitored can suffer from poor performance and inefficiencies, leading to slower response times. This can be a problem in the financial services, where the responsiveness of financial firm’s computer systems must match those of its customers and the markets. While slow response times are serious for low latency trading systems, users of high-touch trading systems will be dissatisfied if the system response time slows noticeably. This can lead to traders missing the market and can culminate in their poorer performance.
5. Slow Enterprise Enhancement
Without proper monitoring, organisations may miss out on opportunities to improve their IT infrastructure, optimise operations and increase efficiency. Systems administrators cannot report on the overall health of the enterprise if they do not have an enterprise view of it. This means that inefficiencies are not so readily identified. Opportunities to enhance the performance of the enterprise, and hence the business, can therefore be missed.
6. Client Churn
All of the above will negatively affect the business and its performance, and this will not go unnoticed by a financial firm’s clients. Clients will begin to shop around and utilise firms which can offer better adherence to their SLAs.
The consequences of having an incomplete view of an enterprise cannot be overstated, and businesses without the resources to keep two eyes on each monitoring dashboard at all times, are in the same boat. From cyber attack to plain old server failure, issues with IT systems can have expensive ramifications for the digitized and multi-system workflows in today’s financial firms. Figure 4 shows issues a firm may suffer if it does not manage to maintain a single comprehensive view of its IT enterprise.
Overall, the consequences of not monitoring an IT enterprise can be severe and can negatively impact an organisation’s reputation, financial stability and long-term success. However, even firms that are successfully monitoring their entire enterprise can fall short due to the timeliness of the monitoring information they garner.
When You Need to Know Now
Scouring through reams of log data to identify and resolve issues in the enterprise is hardly ideal in today’s lightning-fast financial services system. Failures can emanate from anywhere to disrupt services. To limit damage to the bottom line, as well as to protect the firm’s reputation, it is critical to remediate and resolve issues as quickly as possible. To do that, real-time monitoring is needed for all the technology stacks across the enterprise – be that of on-premises or cloud-based IT systems. The only way to take this final step is for banks to:
- Eliminate the delivery of log data in a batch, and replace this with real-time delivery;
- Ensure that any logging or monitoring applications are not in the critical path of a workflow, and do not introduce latency into the process;
- Ensure that analysis of logging data is done automatically in real time; and that
- All logging and metrics from trading infrastructure, networks, applications and third-party services are channelled into one dashboard.
However, data points garnered from multiple sources, even if presented together on a real-time dashboard, do not immediately present the user with a vision of the current state of the enterprise. The data must be intelligently integrated and analysed such that data points are contextualised against each other.
One View of Everything
Almost all financial institutions are moving to the cloud to some extent – of that there is little doubt. Globally, many financial institutions are looking at their cost base with some concern as the current macroeconomic environment brings the consequences of war, disease and political turmoil to bear. In this context, applications and systems are carefully assessed to determine what value a migration to the cloud would deliver, and the level of risk it would pose, before it is selected.
Monitoring capabilities which provide IT teams with a view of both cloud-based resources as well as on-premises networks can highlight the bottlenecks and problems across the enterprise on the ground and in the cloud. In effect, this end-to-end visibility of the enterprise would significantly de-risk any migration to cloud technology, as well as improve the ongoing management of the enterprise. Optimally, this visibility would be delivered via a single dashboard – or single pane of glass, if you will – whereby insights from both the cloud and the on-premises infrastructure could be drawn.
And that would be really powerful.
ITRS Geneos supports the most complex and interconnected IT estates, providing real-time monitoring for transactions, processes, applications and infrastructure across on-premises, cloud and highly dynamic containerised environments.