Skip to main content

The boundary between "Data" and "Code" is blurring


In the corporate world, the usage of Excel files is still enormous. Many business tools are still running in Excel, often extended with VBA macros and applications.

As an Excel fan myself, I can relate to that, as nothing else allows in such a short time to build a powerful and flexible tool, adapted to your specific business needs.

However as a software engineer, I also see the issues with these decentrally managed business tools built in Excel. Typically there are 4 main issues with these tools, i.e.

  • The lack of central storage

  • The lack of collaboration possibilities offered by Excel (we all tried to work on a shared workbook, but don’t think anyone was really satisfied with the result)

  • The lack of versioning

  • The (non-)functional limitations of a tool like Excel. Numerous examples exist of an Excel tool growing continuously in complexity and size, until one day it becomes impossible to maintain, it becomes unacceptably slow and/or you reach the functional (e.g. certain logic which cannot be implemented) or technical (e.g. row or column size limit) limitations.

Microsoft has done already considerable efforts to resolve these limitations (e.g. with Office 365) and also other firms are trying to offer alternatives while keeping the flexibility of a spreadsheet (e.g. Airtable or Trello), but some limitations are inherently linked to the way a spreadsheet works. This is caused, by the fact that in a spreadsheet, data and software are not isolated from each other, which means the software part of a spreadsheet cannot be versioned, maintained, optimized and/or upgraded separately from the data.

In general data and code scale very differently. When code scales, it can be modularized (via classes, methods, functions…​) at different abstraction levels, allowing to keep the scaling of code under control.
Data however needs to stay in one place or at least a single access point to all data has to be provided (e.g. a database might store data in multiple files, even on different machines in a distributed architecture, but for the user this should be abstracted away).
This difference in scaling behavior means that data and code should be maximum separated.

Unfortunately in many cases, the boundary between data and code is not that clear (a whole grey zone exists), which is an issue not only for a spreadsheet but for every application in general.

A good metric to define the positioning on this "data versus code" scale, is looking at the "Production update frequency" of an object. A higher frequency of production updates means a stronger positioning towards "data".
This will typically lead to 3 large categories, i.e. code, soft-code and data, although these categories will still blur into each other.

  • Code: the actual programming language code (although domain-specific languages should also be considered), created by a software engineer and known at the moment of writing the code (known at design time). This is typically delivered to production according to a release schedule and through a controlled deployment pipeline.

  • Data: any data created by the application itself, so only known at the moment the application is running (i.e. not known at design time, so impossible to hard-code in the code). This can be as from a user interaction (i.e. operational data) or automatically derived by the application itself (e.g. data created by calculations done on the operational data or logging and monitoring data).

  • Soft-code: any mix between code and data, which can be adapted by a software engineer or business product owner. The adaptation can be done directly in production, via an administration account, or can be delivered via the deployment pipeline. As risk of regressions are lower, the deployment restrictions are often less severe than for code. The value of such a customization can be known at the moment of coding, only when releasing (e.g. environment specific settings) or even only when the application is already running.

More and more these barriers are blurring, making it more difficult to provide a correct definition of these terms, due to evolutions like:

  • Self-evolving code and machine learning, where the behavior of the application itself adapts automatically based on the provided (training or production) data (cfr. blog "https://bankloch.blogspot.com/2020/09/ai-in-financial-services-buzzword-that.html")

  • DevOps, resulting in continuous delivery, meaning that code can get also a very high "Production update frequency" (cfr. blog "https://bankloch.blogspot.com/2020/10/devops-for-dummies.html")

  • RPA, Low- and No-code platforms, which allow business people to develop complex tooling (cfr. blogs "https://bankloch.blogspot.com/2020/07/low-and-no-code-platforms-will-it.html" and "https://bankloch.blogspot.com/2020/05/rpa-miracle-solution-for-incumbent.html")

  • Business package software (like SAP, SalesForce…​), which allow extensive customizations in the business application itself via parameterization, drag & drop options, scripting languages…​

  • …​

Nonetheless the positioning of each object on the "code vs. data" ax should be carefully investigated, as it has some important consequences, like:

  • How a change is released to production: for "data" this is done by making a direct update in the production environment, while for "code" this is normally done via a deployment pipeline

  • Storage of the latest version: for "data" the latest version will always be the (master) production database, while for "code" this is typically a development branch in a source code repository.

  • Testing: while code will typically transition through a (automated) testing cycle, "data" will normally not.

  • Collaboration: while code collaboration is usually done in a decentralized way, i.e. by checking out the code, making individual adaptations and then checking in and merging to a central repository (and potentially resolving merge conflicts), "data" collaboration is usually handled more centrally, via a locking mechanism, locking records for update, when a user is making an update on them.

  • Search capabilities: as code is usually stored in text files, typically coding tools will work with text searches. Data on the other hand can be stored in different structures (like a structured data model, a JSON object…​), allowing more structured querying (like joins, grouping, complex filtering…​).

For the "Soft-code" category, usually a choice between the "data" and "code" extremes should be made for each of those above characteristics. These decisions will determine the overall positioning on the "data vs. code" scale.

Typical examples of elements in this "Soft-code" category are:

  • Parameterization (or Configuration): this consists of a set of parameters or configuration values determining the specifics of an application. This can take different forms, i.e. from different variables determining the behavior of the application, all the way to the configuration of screens, reports, lists, table structures and meta-dictionary updates (like user defined fields or tables). The result of a parameterization is either stored as data in a database, as structured files (e.g. in JSON, XML or YAML format) or as code written in an SDK.
    Usually this kind of soft-code is partially handled as code (i.e. the default configurations delivered by the product team) and partially handled as data (i.e. the specific modifications made by business owners to the behavior of the application in production).

  • Feature flags: with DevOps and Agile pushing more and more towards gradual roll-out and data-driven product management (i.e. trying out new features to a small subset of users in production and monitoring the result), feature flags become a vital part of a modern application. These flags allow to enable or disable specific features of an application and often allow to segment users, allowing to enable features only for some users.

  • Referential, static data: these are usually tables, storing static data. Typical examples are the list of countries, currencies, postal codes, MCC sector codes, languages and more specific code tables (enumerations). As there is often no application screen foreseen to manage this data and also no end-user owning this data, the data will usually be updated by the product team via the standard release procedure. This makes that even if this is clearly data, it gets a lot of characteristics of "code".

  • Translations: almost any multi-lingual application works with hardcoded technical keys for each label and text appearing on the application screens. Those technical keys are then replaced by the corresponding value in a translation file in the language of the user. As updating translations often requires complex manipulations and also ideally requires some testing (e.g. a too long translation might give lay-out issues on a screen), the deployment of translations is usually done in the same way as "code". The ownership of the actual translations is however typically a business owner, like the marketing department, which gives it the characteristics of "data".

  • Static content: this consists of rich texts (with lay-out), pictures and other media files, which appear on the application screen or are generated as an output of an application (like a report, email, SMS, notification…​). This type of content is usually managed in a content management system, but has often a strong link with the code, as this content should be retrieved and included in the right place in the application. Furthermore this content often includes variables, which are dynamic and therefore need to be replaced by the actual value at run-time.
    Another typical issue is that the CMS tool produces static webpages, while business often wants a dynamic behavior on the pages. As a result, software engineers need to plug in specific code to have the dynamic behavior on the static page.

  • AI models: these have also the notion of being both "code" and "data". The models are "code", as they contain business logic, are usually created by software engineers and normally delivered to the production environment in a controlled way. However they are also "data", as these models are actually coefficients in large matrices and as such cannot be debugged like regular code and can evolve directly in production based on new data (if desired).

The above examples are becoming more and more prominent in modern software applications. As such the category of "Soft-code" seems to grow considerably, due to

  • A lack of IT resources combined with an ever-growing demand for digitalization: as a result business resources want to be able to make certain changes themselves, allowing to bypass the capacity issues within an IT department.

  • The ever-increasing need for flexibility: in the fast-changing world we are in, waiting for the next release is often simply not an option. As a result more and more possibilities are incorporated in modern applications to modify the application in a controlled way at run-time.

  • A growing need to be as efficient as possible: the overhead in communication from a business owner up to a developer can be enormous in many organizations (as in many organizations there can be 3 or more intermediaries facilitating this communication). This overhead leads to higher lead times, more work for everyone and errors due to miscommunication. As a result business owners want to be able to do more themselves and "Soft-code" can provide a solution to this.

Although moving more "Code" to the "Soft-code" category seems to be a good idea at first, it does raise several concerns:

  • Definition of the master copy: while for "data" and "code" there is a single master copy (for "code" the source code versioning system and for "data" the production database), this is not the case for "customizations". On the one hand there is the production platform containing the current situation, while the source code management system and/or one of the test environments can contain the future situation, not to mention any environment specific "customizations".

  • Collaboration: as there is no central unique storage, collaboration becomes also difficult, as it is not possible to lock/check-out a specific customization in/from the central unique storage.

  • Versioning: versioning becomes also very difficult due to this lack of a single master copy. As a result, it becomes very difficult to analyse and reproduce bugs, audit changes (who did which change) and identify what had changed over time.
    Due to this lack of versioning, it occurs regularly that IT releases a configuration change (a soft-code change) overwriting something that has been done by a business owner.

  • Non-functional limitations: "soft-code" is usually the result of an abstraction level (like a CMS, translation labels…​), which of course also limits the flexibility. As such any requirement not matching the prescriptions of the abstraction, can result in a serious refactoring of both the "soft-code" and the code implementing the abstraction.
    Additionally soft-code is usually not created to support large volumes. When the volume grows, it becomes very hard to maintain, but also performance issues occur. Clearly "code" will execute several factors faster than soft-code, as soft-code needs to be retrieved, converted into application code and then to machine instructions.

  • Operational risks due to misuse: in many organisations soft-code is also misused. As in many large organizations releasing software is an extremely hard and difficult process, which can only be done a few times a year, developers start building in all kind of configuration and customization possibilities, which allow to modify the behavior of the application without a release. This is however a very risky situation, as certain configuration changes can have such a severe impact on the application’s behavior, that extensive testing is also an absolute necessity. Usually this step is however skipped in this type of changes.

  • Lack of documentation: due to the flexible nature of soft-code and the specific way of storing it, documenting soft-code is more complex than documenting code (where all kind of tools, notations and frameworks exist for). As a result, soft-code is usually not documented at all, meaning it often becomes a source of errors and concern after a few years.

  • Flexibility has a cost price: as indicated above, supporting soft-code, requires building an abstraction layer in your code base to support the soft-code. Building such an abstraction layer increases however the complexity and cost of an application. It should therefore always be discussed if the flexibility is really required and if hard-coding is not an option as well. Usually hard-coding will be several factors cheaper than supporting soft-code.

Clearly there is a danger in this growth of soft-code. As such, it would be better to push certain soft-code back to the side of "code" or "data".
For this to work, we need to find solutions for the reasons of the rise of soft-code, i.e. the lack of IT capacity, the need for more flexibility and the difficulties in communication, but also work on making "data" more controllable.

A few ideas can be:

  • Treat soft-code more as "code" by hardcoding in the code or by enforcing the same governance to soft-code as to "code" (i.e. source control, versioning, testing, releasing…​). This means it would no longer be possible to change the soft-code elements directly in production via an administrative access, but instead everything should be included in the source control system and delivered via the standard deployment pipeline. In order for this to work, we need however

    • Continuous deployment: with DevOps allowing to deploy daily or even continuously, "code" can get the same flexibility as soft-code, resolving already one of the reasons to introduce soft-code.

    • Easy ways for business people to also commit certain changes into the source control system (and thus trigger the CI/CD pipeline): of course this requires additional tooling, as business people are not acquainted working with a source control system and the risk of corrupting the code is big. Some kind of user-interface allowing to update source files in a controlled, restricted and convenient way is therefore a necessity.

    • Stronger integration of teams, i.e. the creation of cross-functional teams responsible for a business functionality. This allows for business people to interact much more directly with IT, e.g. the business could also be using JIRA to create business tasks and as such easily assign a ticket to IT for an update of the soft-code.

  • Infrastructure as a Code (IaaC): some soft-code is the result of a need to run the same application in different environments. At that moment, a number of parameters need to be adapted to the environment, such as disabling certain external communication on test environments or adapting URLs, tokens and ports of integrated applications and external partners.
    Obviously this should be handled via soft-code, as nobody wants to maintain multiple versions of your source code depending on the environment. It is important however to manage these environment specific variables fully separately as part of your IaaC code basis. Ideally these are handled via environment variables and secrets. A tool like Kubernetes has a whole setup to correctly manage these types of environment specific configurations in a good way.

  • Soft-code duplication in a "code" and "data" part: instead of handling soft-code purely as "code", you can also duplicate the soft-code in a "code" part and a "data" part. The "code" part is considered as the "default" settings and is fully managed by the IT product team. This ensures that the application can fully work.
    Additionally there is a file or table, which can be adapted by a business owner, to override these defaults. The file/table will only store the overrides and is fully handled as "data".
    This allows a good separation of concerns, but good regression and functional testing remains an issue as the IT product team might not have same setup of the application as in production.

  • Specialized tooling, which allow a business user to manage specific soft-code in an easy and controlled way, while the tooling provides easy methods for IT to deliver the changes in a controlled way to different environments. Examples are feature-flag software (such as Split, Optimizely, A/B Tasty, Adobe Target…​), translation management software (e.g. Transifex, Phrase, Lokalise, Loco, Crowdin…​), CMS software (Wordpress, Dreamweaver, Drupal, Joomla!, Contentful, Contentstack,…​)…​

  • Treat soft-code more as "data": soft-code can also be treated more as data. Some of the concerns can then be resolved by:

    • Regular creation of snapshots (after every change or on a frequent basis), by extracting all soft-code data and automatically storing this snapshot in a source control repository. The source control repository will automatically identify differences compared to the previous snapshot, which the user can document. As such some kind of versioning can be obtained.

    • Easy replication of (scrambled) a production database to the development and test environments

  • Creation of IDE tooling specifically for data: such tooling allows to create data in a user-friendly way (e.g. via syntax highlighting, auto-completion, bulk updates…​), while providing all tooling available to manage code, like versioning (through a source control system), deployment pipelines and debugging (e.g. for AI allow to see immediately on which data your model performs the worst. This can be a sign of bad labelling or the model having insufficient data points) and syntax validation (e.g. add easy syntax rules for structured data representing soft-code).

    • Documentation: develop standard mechanisms to easily document data, with possibilities to extract the full documentation or be automatically alerted when specific documentation should be updated (as the linked data element has been modified)

    • Automatic auditing: automatic storing of a full audit trace (who did which update at which time and from which channel). A tool like Basedash, which allows to interface with a database for support teams to make specialist updates, while automatically auditing all those updates can provide a solution to this.

  • …​

Clearly there is a need for high flexibility in software and business owners to be able to rapidly create changes to existing software. As such we can either make software engineering departments more Agile and faster, or we can hand-over a number of IT tasks to business people in a controlled way. As the need is so high, both trends are happening in parallel. On the IT side, there is the rise of DevOps, Agile, microservices based architecture and cloud native architecture, all allowing IT departments to become more flexible and faster. At business side, all kind of tools help business people to implement business logic and other changes in a controlled way (RPA tools, Low- and No-code platforms, CMS tools, packaged software…​). These tools empower business people to be more independent of IT. Clearly both trends work towards the same goal, but it will be interesting to see which trend takes the upper hand and in which context.

Comments

Popular posts from this blog

Transforming the insurance sector to an Open API Ecosystem

1. Introduction "Open" has recently become a new buzzword in the financial services industry, i.e.   open data, open APIs, Open Banking, Open Insurance …​, but what does this new buzzword really mean? "Open" refers to the capability of companies to expose their services to the outside world, so that   external partners or even competitors   can use these services to bring added value to their customers. This trend is made possible by the technological evolution of   open APIs (Application Programming Interfaces), which are the   digital ports making this communication possible. Together companies, interconnected through open APIs, form a true   API ecosystem , offering best-of-breed customer experience, by combining the digital services offered by multiple companies. In the   technology sector   this evolution has been ongoing for multiple years (think about the travelling sector, allowing you to book any hotel online). An excellent example of this

Are product silos in a bank inevitable?

Silo thinking   is often frowned upon in the industry. It is often a synonym for bureaucratic processes and politics and in almost every article describing the threats of new innovative Fintech players on the banking industry, the strong bank product silos are put forward as one of the main blockages why incumbent banks are not able to (quickly) react to the changing customer expectations. Customers want solutions to their problems   and do not want to be bothered about the internal organisation of their bank. Most banks are however organized by product domain (daily banking, investments and lending) and by customer segmentation (retail banking, private banking, SMEs and corporates). This division is reflected both at business and IT side and almost automatically leads to the creation of silos. It is however difficult to reorganize a bank without creating new silos or introducing other types of issues and inefficiencies. An organization is never ideal and needs to take a number of cons

RPA - The miracle solution for incumbent banks to bridge the automation gap with neo-banks?

Hypes and marketing buzz words are strongly present in the IT landscape. Often these are existing concepts, which have evolved technologically and are then renamed to a new term, as if it were a brand new technology or concept. If you want to understand and assess these new trends, it is important to   reduce the concepts to their essence and compare them with existing technologies , e.g. Integration (middleware) software   ensures that 2 separate applications or components can be integrated in an easy way. Of course, there is a huge evolution in the protocols, volumes of exchanged data, scalability, performance…​, but in essence the problem remains the same. Nonetheless, there have been multiple terms for integration software such as ETL, ESB, EAI, SOA, Service Mesh…​ Data storage software   ensures that data is stored in such a way that data is not lost and that there is some kind guaranteed consistency, maximum availability and scalability, easy retrieval and searching

IoT - Revolution or Evolution in the Financial Services Industry

1. The IoT hype We have all heard about the   "Internet of Things" (IoT)   as this revolutionary new technology, which will radically change our lives. But is it really such a revolution and will it really have an impact on the Financial Services Industry? To refresh our memory, the Internet of Things (IoT) refers to any   object , which is able to   collect data and communicate and share this information (like condition, geolocation…​)   over the internet . This communication will often occur between 2 objects (i.e. not involving any human), which is often referred to as Machine-to-Machine (M2M) communication. Well known examples are home thermostats, home security systems, fitness and health monitors, wearables…​ This all seems futuristic, but   smartphones, tablets and smartwatches   can also be considered as IoT devices. More importantly, beside these futuristic visions of IoT, the smartphone will most likely continue to be the center of the connected devi

Neobanks should find their niche to improve their profitability

The last 5 years dozens of so-called   neo- or challenger banks  (according to Exton Consulting 256 neobanks are in circulation today) have disrupted the banking landscape, by offering a fully digitized (cfr. "tech companies with a banking license"), very customer-centric, simple and fluent (e.g. possibility to become client and open an account in a few clicks) and low-cost product and service offering. While several of them are already valued at billions of euros (like Revolut, Monzo, Chime, N26, NuBank…​), very few of them are expected to be profitable in the coming years and even less are already profitable today (Accenture research shows that the average UK neobank loses $11 per user yearly). These challenger banks are typically confronted with increasing costs, while the margins generated per customer remain low (e.g. due to the offering of free products and services or above market-level saving account interest rates). While it’s obvious that disrupting the financial ma

From app to super-app to personal assistant

In July of this year,   KBC bank   (the 2nd largest bank in Belgium) surprised many people, including many of us working in the banking industry, with their announcement that they bought the rights to   broadcast the highlights of soccer matches   in Belgium via their mobile app (a service called "Goal alert"). The days following this announcement the news was filled with experts, some of them categorizing it as a brilliant move, others claiming that KBC should better focus on its core mission. Independent of whether it is a good or bad strategic decision (the future will tell), it is clearly part of a much larger strategy of KBC to   convert their banking app into a super-app (all-in-one app) . Today you can already buy mobility tickets and cinema tickets and use other third-party services (like Monizze, eBox, PayPal…​) within the KBC app. Furthermore, end of last year, KBC announced opening up their app also to non-customers allowing them to also use these third-party servi

Can Augmented Reality make daily banking a more pleasant experience?

With the   increased competition in the financial services landscape (between banks/insurers, but also of new entrants like FinTechs and Telcos), customers are demanding and expecting a more innovative and fluent digital user experience. Unfortunately, most banks and insurers, with their product-oriented online and mobile platforms, are not known for their pleasant and fluent user experience. The   trend towards customer oriented services , like personal financial management (with functions like budget management, expense categorization, saving goals…​) and robo-advise, is already a big step in the right direction, but even then, managing financials is still considered to be a boring intangible and complex task for most people. Virtual (VR) and augmented reality (AR)   could bring a solution. These technologies provide a user experience which is   more intuitive, personalised and pleasant , as they introduce an element of   gamification   to the experience. Both VR and AR

Low- and No-code platforms - Will IT developers soon be out of a job?

“ The future of coding is no coding at all ” - Chris Wanstrath (CEO at GitHub). Mid May I posted a blog on RPA (Robotic Process Automation -   https://bankloch.blogspot.com/2020/05/rpa-miracle-solution-for-incumbent.html ) on how this technology, promises the world to companies. A very similar story is found with low- and no-code platforms, which also promise that business people, with limited to no knowledge of IT, can create complex business applications. These   platforms originate , just as RPA tools,   from the growing demand for IT developments , while IT cannot keep up with the available capacity. As a result, an enormous gap between IT teams and business demands is created, which is often filled by shadow-IT departments, which extend the IT workforce and create business tools in Excel, Access, WordPress…​ Unfortunately these tools built in shadow-IT departments arrive very soon at their limits, as they don’t support the required non-functional requirements (like high availabili

Beyond Imagination: The Rise and Evolution of Generative AI Tools

Generative AI   has revolutionized the way we create and interact with digital content. Since the launch of Dall-E in July 2022 and ChatGPT in November 2022, the field has seen unprecedented growth. This technology, initially popularized by OpenAI’s ChatGPT, has now been embraced by major tech players like Microsoft and Google, as well as a plethora of innovative startups. These advancements offer solutions for generating a diverse range of outputs including text, images, video, audio, and other media from simple prompts. The consumer now has a vast array of options based on their specific   output needs and use cases . From generic, large-scale, multi-modal models like OpenAI’s ChatGPT and Google’s Bard to specialized solutions tailored for specific use cases and sectors like finance and legal advice, the choices are vast and varied. For instance, in the financial sector, tools like BloombergGPT ( https://www.bloomberg.com/ ), FinGPT ( https://fin-gpt.org/ ), StockGPT ( https://www.as

PFM, BFM, Financial Butler, Financial Cockpit, Account Aggregator…​ - Will the cumbersome administrative tasks on your financials finally be taken over by your financial institution?

1. Introduction Personal Financial Management   (PFM) refers to the software that helps users manage their money (budget, save and spend money). Therefore, it is often also called   Digital Money Management . In other words, PFM tools   help customers make sense of their money , i.e. they help customers follow, classify, remain informed and manage their Personal Finances. Personal Finance   used to be (or still is) a time-consuming effort , where people would manually input all their income and expenses in a self-developed spreadsheet, which would gradually be extended with additional calculations. Already for more than 20 years,   several software vendors aim to give a solution to this , by providing applications, websites and/or apps. These tools were never massively adopted, since they still required a lot of manual interventions (manual input of income and expense transaction, manual mapping transactions to categories…​) and lacked an integration in the day-to-da