Skip to main content

From Data to Decisions: The Journey of Data in Analytics


"Data is the new gold" is a quote often heard, especially in the financial services industry. This sector deals with enormous amounts of extremely valuable data about their customers and handles purely virtual products or services. Thus, data is an essential part of the financial services business.

While the value of data is undeniable, data is not a scarce commodity, unlike "gold." Enormous amounts of data are produced daily. For example, the world created around 118 million petabytes in 2023 and is expected to create 463 million petabytes per day by 2025 (for reference, a single petabyte is one million gigabytes). Although storing and processing these vast amounts of data is complex, the true challenge lies in converting this raw (almost infinite) data into actionable insights. This is where "Big Data Analytics" comes into the picture.

Data analytics is no longer a nice-to-have for businesses. With financial services companies becoming primarily digital, data-driven decision-making is becoming the norm, helping executives make evidence-based decisions instead of relying on guesswork. However, this requires complex data pipelines and analytics tools to bridge the gap between raw data and valuable insights. Such pipelines form complex journeys for data:

  • Capturing and Collecting Raw Data: Data needs to be captured. In many cases, the data is already captured, but sometimes additional capturing and storage might be needed. For example, not only storing the current version of data but also all historical changes to data and/or capturing meta-information, such as who made which change, when, and how (via which channel, screen, IP address, OS, etc.).
    Usually, data is captured by different systems (silos) in various formats and technologies. Centralizing all data in a data lake can be extremely useful for the subsequent steps.

  • Inventorizing the Data: If you do not know data exists, you cannot use it to generate insights. Therefore, proper inventorizing of data is required.
    Creating data catalogs is a crucial aspect of data governance and management. Think of a data catalog as a well-organized inventory of data assets. However, manual cataloging is error-prone and time-consuming. GenAI helps overcome this shortcoming with AI-driven data curation and cataloging. It recognizes correlations and relationships between data sets and automatically categorizes and tags them.
    GenAI-powered data catalogs can offer self-service capabilities with chatbot-style interfaces, facilitating seamless data discovery. Automated cataloging also helps maintain data consistency and integrity, which are crucial for data management.

  • Cleaning and Structuring the Data:

    • Converting Unstructured Data into Structured Data: A lot of valuable data is unstructured (e.g. text documents, pictures, movies, voice recordings), containing a wealth of interesting information. Structuring this unstructured data into some kind of model is critical for data analytics. Techniques such as OCR, automatic tagging, pattern recognition, voice transcription…​ can help achieve this.

    • Filtering Out Irrelevant Data: Ensuring only pertinent data is processed. This can include techniques like selecting the right time frame or the right customers, filtering out duplicate data, removing noise…​

    • Cleaning Data: Identifying errors and anomalies in the data and correcting them. This can be done via techniques like automatic correction algorithms, manual actions, comparing different sources and automatically taking the majority vote…​

    • Data Modelling: Even structured data is not always consistent. For example, different payment messages at a bank may be structured differently (e.g. some in SWIFT MT format, others in ISO 20022, or proprietary formats). Therefore, mapping towards a common model is required to ensure consistent semantics and syntax for all instances of the same object type (e.g. same date/time format, including the expression of time zone, same number format, consistent set of enumerated values, same terminology…​). This allows a uniform view of similar data from different sources, addressing consistency and quality issues.

    • Data Augmentation: Once data is structured and modeled, new data (or specific views) can be derived from existing data sets, allowing for easier and more efficient data use.

  • Analyzing the Data: Once we have a uniform, structured view of all the data, we need to analyze it by slicing and dicing it over multiple dimensions (like time, value, customer segment…​) and visually presenting aggregated information in dashboards. This process can be time-consuming, as generating these dashboards often requires setting up complex database queries.
    GenAI can extract meaningful insights from data using the right text-based prompts, identifying correlations and hidden patterns within the available information.

  • Interpreting Results: From the dashboards, correct conclusions need to be drawn. This requires advanced business insights and a good knowledge of statistics, as some conclusions might seem obvious but might not be statistically relevant. This is a common error within companies, as statistical measures like variance on results are rarely shown in management dashboards. Insights derived from the results finally need to lead to concrete actions.

  • Implementing Insights: The defined actions should be implemented and followed up. Ideally, analyses should be regularly re-executed to see the effects of the actions. In business, it is not possible to isolate a specific action from everything else happening (like market evolutions, competitors' changes, other internal changes, employee transfers). Using techniques like A/B testing, we should try to isolate the applied action’s effects as much as possible, allowing us to identify the positive or negative impact and make quick adaptations if needed.

AI plays a crucial role in analyzing data by identifying patterns that are not straightforward for humans due to the size and complexity of the data sets. However, AI requires enormous amounts of high-quality data to train its models. Therefore, good AI use cases require that the above-defined data pipelines already exist.

Luckily, as indicated above, AI (and specifically GenAI) can also help set up those data pipelines. GenAI can help clean and structure data, understand natural language queries, and turn those questions into reports and answers.

This makes the work currently requiring highly skilled data analysts, data engineers, and data scientists accessible to anyone, completing the "Big Data revolution". However, caution is needed because interpreting data can be tricky for both humans and AI. Our human brain (and as AI is modeled on our data, often AI models as well) tends to make statistically incorrect conclusions. Additionally, a good knowledge of the business is still essential to ensure the data is correctly structured and interpreted.

For example, in the financial services sector, a simple payment comes with different amounts (e.g. transaction currency or base currency, with and without costs), different dates (e.g. initiation date, processing date, settlement date), and different involved parties (e.g. payer, payee, payer institution, payee institution, agent of payer, intermediaries…​). Using the right field in the right context is crucial to making the right decisions. Handing this over to any employee in combination with AI is likely to result in incorrect insights.

Therefore, specialized tooling that combines advanced analytics with deep business insights is likely the future.

Comments

Popular posts from this blog

Transforming the insurance sector to an Open API Ecosystem

1. Introduction "Open" has recently become a new buzzword in the financial services industry, i.e.   open data, open APIs, Open Banking, Open Insurance …​, but what does this new buzzword really mean? "Open" refers to the capability of companies to expose their services to the outside world, so that   external partners or even competitors   can use these services to bring added value to their customers. This trend is made possible by the technological evolution of   open APIs (Application Programming Interfaces), which are the   digital ports making this communication possible. Together companies, interconnected through open APIs, form a true   API ecosystem , offering best-of-breed customer experience, by combining the digital services offered by multiple companies. In the   technology sector   this evolution has been ongoing for multiple years (think about the travelling sector, allowing you to book any hotel online). An excellent example of this

Are product silos in a bank inevitable?

Silo thinking   is often frowned upon in the industry. It is often a synonym for bureaucratic processes and politics and in almost every article describing the threats of new innovative Fintech players on the banking industry, the strong bank product silos are put forward as one of the main blockages why incumbent banks are not able to (quickly) react to the changing customer expectations. Customers want solutions to their problems   and do not want to be bothered about the internal organisation of their bank. Most banks are however organized by product domain (daily banking, investments and lending) and by customer segmentation (retail banking, private banking, SMEs and corporates). This division is reflected both at business and IT side and almost automatically leads to the creation of silos. It is however difficult to reorganize a bank without creating new silos or introducing other types of issues and inefficiencies. An organization is never ideal and needs to take a number of cons

RPA - The miracle solution for incumbent banks to bridge the automation gap with neo-banks?

Hypes and marketing buzz words are strongly present in the IT landscape. Often these are existing concepts, which have evolved technologically and are then renamed to a new term, as if it were a brand new technology or concept. If you want to understand and assess these new trends, it is important to   reduce the concepts to their essence and compare them with existing technologies , e.g. Integration (middleware) software   ensures that 2 separate applications or components can be integrated in an easy way. Of course, there is a huge evolution in the protocols, volumes of exchanged data, scalability, performance…​, but in essence the problem remains the same. Nonetheless, there have been multiple terms for integration software such as ETL, ESB, EAI, SOA, Service Mesh…​ Data storage software   ensures that data is stored in such a way that data is not lost and that there is some kind guaranteed consistency, maximum availability and scalability, easy retrieval and searching

IoT - Revolution or Evolution in the Financial Services Industry

1. The IoT hype We have all heard about the   "Internet of Things" (IoT)   as this revolutionary new technology, which will radically change our lives. But is it really such a revolution and will it really have an impact on the Financial Services Industry? To refresh our memory, the Internet of Things (IoT) refers to any   object , which is able to   collect data and communicate and share this information (like condition, geolocation…​)   over the internet . This communication will often occur between 2 objects (i.e. not involving any human), which is often referred to as Machine-to-Machine (M2M) communication. Well known examples are home thermostats, home security systems, fitness and health monitors, wearables…​ This all seems futuristic, but   smartphones, tablets and smartwatches   can also be considered as IoT devices. More importantly, beside these futuristic visions of IoT, the smartphone will most likely continue to be the center of the connected devi

PSD3: The Next Phase in Europe’s Payment Services Regulation

With the successful rollout of PSD2, the European Union (EU) continues to advance innovation in the payments domain through the anticipated introduction of the   Payment Services Directive 3 (PSD3) . On June 28, 2023, the European Commission published a draft proposal for PSD3 and the   Payment Services Regulation (PSR) . The finalized versions of this directive and associated regulation are expected to be available by late 2024, although some predictions suggest a more likely timeline of Q2 or Q3 2025. Given that member states are typically granted an 18-month transition period, PSD3 is expected to come into effect sometime in 2026. Notably, the Commission has introduced a regulation (PSR) alongside the PSD3 directive, ensuring more harmonization across member states as regulations are immediately effective and do not require national implementation, unlike directives. PSD3 shares the same objectives as PSD2, i.e.   increasing competition in the payments landscape and enhancing consum

Trade-offs Are Inevitable in Software Delivery - Remember the CAP Theorem

In the world of financial services, the integrity of data systems is fundamentally reliant on   non-functional requirements (NFRs)   such as reliability and security. Despite their importance, NFRs often receive secondary consideration during project scoping, typically being reduced to a generic checklist aimed more at compliance than at genuine functionality. Regrettably, these initial NFRs are seldom met after delivery, which does not usually prevent deployment to production due to the vague and unrealistic nature of the original specifications. This common scenario results in significant end-user frustration as the system does not perform as expected, often being less stable or slower than anticipated. This situation underscores the need for   better education on how to articulate and define NFRs , i.e. demanding only what is truly necessary and feasible within the given budget. Early and transparent discussions can lead to system architecture being tailored more closely to realisti

An overview of 1-year blogging

Last week I published my   60th post   on my blog called   Bankloch   (a reference to "Banking" and my family name). The past year, I have published a blog on a weekly basis, providing my humble personal vision on the topics of Fintech, IT software delivery and mobility. This blogging has mainly been a   personal enrichment , as it forced me to dive deep into a number of different topics, not only in researching for content, but also in trying to identify trends, innovations and patterns into these topics. Furthermore it allowed me to have several very interesting conversations and discussions with passionate colleagues in the financial industry and to get more insights into the wonderful world of blogging and more general of digital marketing, exploring subjects and tools like: Search Engine Optimization (SEO) LinkedIn post optimization Google Search Console Google AdWorks Google Blogger Thinker360 Finextra …​ Clearly it is   not easy to get the necessary attention . With th

Low- and No-code platforms - Will IT developers soon be out of a job?

“ The future of coding is no coding at all ” - Chris Wanstrath (CEO at GitHub). Mid May I posted a blog on RPA (Robotic Process Automation -   https://bankloch.blogspot.com/2020/05/rpa-miracle-solution-for-incumbent.html ) on how this technology, promises the world to companies. A very similar story is found with low- and no-code platforms, which also promise that business people, with limited to no knowledge of IT, can create complex business applications. These   platforms originate , just as RPA tools,   from the growing demand for IT developments , while IT cannot keep up with the available capacity. As a result, an enormous gap between IT teams and business demands is created, which is often filled by shadow-IT departments, which extend the IT workforce and create business tools in Excel, Access, WordPress…​ Unfortunately these tools built in shadow-IT departments arrive very soon at their limits, as they don’t support the required non-functional requirements (like high availabili

Beyond Imagination: The Rise and Evolution of Generative AI Tools

Generative AI   has revolutionized the way we create and interact with digital content. Since the launch of Dall-E in July 2022 and ChatGPT in November 2022, the field has seen unprecedented growth. This technology, initially popularized by OpenAI’s ChatGPT, has now been embraced by major tech players like Microsoft and Google, as well as a plethora of innovative startups. These advancements offer solutions for generating a diverse range of outputs including text, images, video, audio, and other media from simple prompts. The consumer now has a vast array of options based on their specific   output needs and use cases . From generic, large-scale, multi-modal models like OpenAI’s ChatGPT and Google’s Bard to specialized solutions tailored for specific use cases and sectors like finance and legal advice, the choices are vast and varied. For instance, in the financial sector, tools like BloombergGPT ( https://www.bloomberg.com/ ), FinGPT ( https://fin-gpt.org/ ), StockGPT ( https://www.as

Deals as a competitive differentiator in the financial sector

In my blog " Customer acquisition cost: probably the most valuable metric for Fintechs " ( https://bankloch.blogspot.com/2020/06/customer-acquisition-cost-probably-most.html ) I described how a customer acquisition strategy can make or break a Fintech. In the traditional Retail sector, focused on selling different types of products for personal usage to end-customers,   customer acquisition  is just as important. No wonder that the advertisement sector is a multi-billion dollar industry. However in recent years due to the digitalization and consequently the rise of   Digital Marketing , customer acquisition has become much more focused on   delivering the right message via the right channel to the right person on the right time . Big tech players like Google and Facebook are specialized in this kind of targeted marketing, which is a key factor for their success and multi-billion valuations. Their exponential growth in marketing revenues seems however coming to a halt, as digi