Skip to main content

The boundary between "Data" and "Code" is blurring


In the corporate world, the usage of Excel files is still enormous. Many business tools are still running in Excel, often extended with VBA macros and applications.

As an Excel fan myself, I can relate to that, as nothing else allows in such a short time to build a powerful and flexible tool, adapted to your specific business needs.

However as a software engineer, I also see the issues with these decentrally managed business tools built in Excel. Typically there are 4 main issues with these tools, i.e.

  • The lack of central storage

  • The lack of collaboration possibilities offered by Excel (we all tried to work on a shared workbook, but don’t think anyone was really satisfied with the result)

  • The lack of versioning

  • The (non-)functional limitations of a tool like Excel. Numerous examples exist of an Excel tool growing continuously in complexity and size, until one day it becomes impossible to maintain, it becomes unacceptably slow and/or you reach the functional (e.g. certain logic which cannot be implemented) or technical (e.g. row or column size limit) limitations.

Microsoft has done already considerable efforts to resolve these limitations (e.g. with Office 365) and also other firms are trying to offer alternatives while keeping the flexibility of a spreadsheet (e.g. Airtable or Trello), but some limitations are inherently linked to the way a spreadsheet works. This is caused, by the fact that in a spreadsheet, data and software are not isolated from each other, which means the software part of a spreadsheet cannot be versioned, maintained, optimized and/or upgraded separately from the data.

In general data and code scale very differently. When code scales, it can be modularized (via classes, methods, functions…​) at different abstraction levels, allowing to keep the scaling of code under control.
Data however needs to stay in one place or at least a single access point to all data has to be provided (e.g. a database might store data in multiple files, even on different machines in a distributed architecture, but for the user this should be abstracted away).
This difference in scaling behavior means that data and code should be maximum separated.

Unfortunately in many cases, the boundary between data and code is not that clear (a whole grey zone exists), which is an issue not only for a spreadsheet but for every application in general.

A good metric to define the positioning on this "data versus code" scale, is looking at the "Production update frequency" of an object. A higher frequency of production updates means a stronger positioning towards "data".
This will typically lead to 3 large categories, i.e. code, soft-code and data, although these categories will still blur into each other.

  • Code: the actual programming language code (although domain-specific languages should also be considered), created by a software engineer and known at the moment of writing the code (known at design time). This is typically delivered to production according to a release schedule and through a controlled deployment pipeline.

  • Data: any data created by the application itself, so only known at the moment the application is running (i.e. not known at design time, so impossible to hard-code in the code). This can be as from a user interaction (i.e. operational data) or automatically derived by the application itself (e.g. data created by calculations done on the operational data or logging and monitoring data).

  • Soft-code: any mix between code and data, which can be adapted by a software engineer or business product owner. The adaptation can be done directly in production, via an administration account, or can be delivered via the deployment pipeline. As risk of regressions are lower, the deployment restrictions are often less severe than for code. The value of such a customization can be known at the moment of coding, only when releasing (e.g. environment specific settings) or even only when the application is already running.

More and more these barriers are blurring, making it more difficult to provide a correct definition of these terms, due to evolutions like:

  • Self-evolving code and machine learning, where the behavior of the application itself adapts automatically based on the provided (training or production) data (cfr. blog "https://bankloch.blogspot.com/2020/09/ai-in-financial-services-buzzword-that.html")

  • DevOps, resulting in continuous delivery, meaning that code can get also a very high "Production update frequency" (cfr. blog "https://bankloch.blogspot.com/2020/10/devops-for-dummies.html")

  • RPA, Low- and No-code platforms, which allow business people to develop complex tooling (cfr. blogs "https://bankloch.blogspot.com/2020/07/low-and-no-code-platforms-will-it.html" and "https://bankloch.blogspot.com/2020/05/rpa-miracle-solution-for-incumbent.html")

  • Business package software (like SAP, SalesForce…​), which allow extensive customizations in the business application itself via parameterization, drag & drop options, scripting languages…​

  • …​

Nonetheless the positioning of each object on the "code vs. data" ax should be carefully investigated, as it has some important consequences, like:

  • How a change is released to production: for "data" this is done by making a direct update in the production environment, while for "code" this is normally done via a deployment pipeline

  • Storage of the latest version: for "data" the latest version will always be the (master) production database, while for "code" this is typically a development branch in a source code repository.

  • Testing: while code will typically transition through a (automated) testing cycle, "data" will normally not.

  • Collaboration: while code collaboration is usually done in a decentralized way, i.e. by checking out the code, making individual adaptations and then checking in and merging to a central repository (and potentially resolving merge conflicts), "data" collaboration is usually handled more centrally, via a locking mechanism, locking records for update, when a user is making an update on them.

  • Search capabilities: as code is usually stored in text files, typically coding tools will work with text searches. Data on the other hand can be stored in different structures (like a structured data model, a JSON object…​), allowing more structured querying (like joins, grouping, complex filtering…​).

For the "Soft-code" category, usually a choice between the "data" and "code" extremes should be made for each of those above characteristics. These decisions will determine the overall positioning on the "data vs. code" scale.

Typical examples of elements in this "Soft-code" category are:

  • Parameterization (or Configuration): this consists of a set of parameters or configuration values determining the specifics of an application. This can take different forms, i.e. from different variables determining the behavior of the application, all the way to the configuration of screens, reports, lists, table structures and meta-dictionary updates (like user defined fields or tables). The result of a parameterization is either stored as data in a database, as structured files (e.g. in JSON, XML or YAML format) or as code written in an SDK.
    Usually this kind of soft-code is partially handled as code (i.e. the default configurations delivered by the product team) and partially handled as data (i.e. the specific modifications made by business owners to the behavior of the application in production).

  • Feature flags: with DevOps and Agile pushing more and more towards gradual roll-out and data-driven product management (i.e. trying out new features to a small subset of users in production and monitoring the result), feature flags become a vital part of a modern application. These flags allow to enable or disable specific features of an application and often allow to segment users, allowing to enable features only for some users.

  • Referential, static data: these are usually tables, storing static data. Typical examples are the list of countries, currencies, postal codes, MCC sector codes, languages and more specific code tables (enumerations). As there is often no application screen foreseen to manage this data and also no end-user owning this data, the data will usually be updated by the product team via the standard release procedure. This makes that even if this is clearly data, it gets a lot of characteristics of "code".

  • Translations: almost any multi-lingual application works with hardcoded technical keys for each label and text appearing on the application screens. Those technical keys are then replaced by the corresponding value in a translation file in the language of the user. As updating translations often requires complex manipulations and also ideally requires some testing (e.g. a too long translation might give lay-out issues on a screen), the deployment of translations is usually done in the same way as "code". The ownership of the actual translations is however typically a business owner, like the marketing department, which gives it the characteristics of "data".

  • Static content: this consists of rich texts (with lay-out), pictures and other media files, which appear on the application screen or are generated as an output of an application (like a report, email, SMS, notification…​). This type of content is usually managed in a content management system, but has often a strong link with the code, as this content should be retrieved and included in the right place in the application. Furthermore this content often includes variables, which are dynamic and therefore need to be replaced by the actual value at run-time.
    Another typical issue is that the CMS tool produces static webpages, while business often wants a dynamic behavior on the pages. As a result, software engineers need to plug in specific code to have the dynamic behavior on the static page.

  • AI models: these have also the notion of being both "code" and "data". The models are "code", as they contain business logic, are usually created by software engineers and normally delivered to the production environment in a controlled way. However they are also "data", as these models are actually coefficients in large matrices and as such cannot be debugged like regular code and can evolve directly in production based on new data (if desired).

The above examples are becoming more and more prominent in modern software applications. As such the category of "Soft-code" seems to grow considerably, due to

  • A lack of IT resources combined with an ever-growing demand for digitalization: as a result business resources want to be able to make certain changes themselves, allowing to bypass the capacity issues within an IT department.

  • The ever-increasing need for flexibility: in the fast-changing world we are in, waiting for the next release is often simply not an option. As a result more and more possibilities are incorporated in modern applications to modify the application in a controlled way at run-time.

  • A growing need to be as efficient as possible: the overhead in communication from a business owner up to a developer can be enormous in many organizations (as in many organizations there can be 3 or more intermediaries facilitating this communication). This overhead leads to higher lead times, more work for everyone and errors due to miscommunication. As a result business owners want to be able to do more themselves and "Soft-code" can provide a solution to this.

Although moving more "Code" to the "Soft-code" category seems to be a good idea at first, it does raise several concerns:

  • Definition of the master copy: while for "data" and "code" there is a single master copy (for "code" the source code versioning system and for "data" the production database), this is not the case for "customizations". On the one hand there is the production platform containing the current situation, while the source code management system and/or one of the test environments can contain the future situation, not to mention any environment specific "customizations".

  • Collaboration: as there is no central unique storage, collaboration becomes also difficult, as it is not possible to lock/check-out a specific customization in/from the central unique storage.

  • Versioning: versioning becomes also very difficult due to this lack of a single master copy. As a result, it becomes very difficult to analyse and reproduce bugs, audit changes (who did which change) and identify what had changed over time.
    Due to this lack of versioning, it occurs regularly that IT releases a configuration change (a soft-code change) overwriting something that has been done by a business owner.

  • Non-functional limitations: "soft-code" is usually the result of an abstraction level (like a CMS, translation labels…​), which of course also limits the flexibility. As such any requirement not matching the prescriptions of the abstraction, can result in a serious refactoring of both the "soft-code" and the code implementing the abstraction.
    Additionally soft-code is usually not created to support large volumes. When the volume grows, it becomes very hard to maintain, but also performance issues occur. Clearly "code" will execute several factors faster than soft-code, as soft-code needs to be retrieved, converted into application code and then to machine instructions.

  • Operational risks due to misuse: in many organisations soft-code is also misused. As in many large organizations releasing software is an extremely hard and difficult process, which can only be done a few times a year, developers start building in all kind of configuration and customization possibilities, which allow to modify the behavior of the application without a release. This is however a very risky situation, as certain configuration changes can have such a severe impact on the application’s behavior, that extensive testing is also an absolute necessity. Usually this step is however skipped in this type of changes.

  • Lack of documentation: due to the flexible nature of soft-code and the specific way of storing it, documenting soft-code is more complex than documenting code (where all kind of tools, notations and frameworks exist for). As a result, soft-code is usually not documented at all, meaning it often becomes a source of errors and concern after a few years.

  • Flexibility has a cost price: as indicated above, supporting soft-code, requires building an abstraction layer in your code base to support the soft-code. Building such an abstraction layer increases however the complexity and cost of an application. It should therefore always be discussed if the flexibility is really required and if hard-coding is not an option as well. Usually hard-coding will be several factors cheaper than supporting soft-code.

Clearly there is a danger in this growth of soft-code. As such, it would be better to push certain soft-code back to the side of "code" or "data".
For this to work, we need to find solutions for the reasons of the rise of soft-code, i.e. the lack of IT capacity, the need for more flexibility and the difficulties in communication, but also work on making "data" more controllable.

A few ideas can be:

  • Treat soft-code more as "code" by hardcoding in the code or by enforcing the same governance to soft-code as to "code" (i.e. source control, versioning, testing, releasing…​). This means it would no longer be possible to change the soft-code elements directly in production via an administrative access, but instead everything should be included in the source control system and delivered via the standard deployment pipeline. In order for this to work, we need however

    • Continuous deployment: with DevOps allowing to deploy daily or even continuously, "code" can get the same flexibility as soft-code, resolving already one of the reasons to introduce soft-code.

    • Easy ways for business people to also commit certain changes into the source control system (and thus trigger the CI/CD pipeline): of course this requires additional tooling, as business people are not acquainted working with a source control system and the risk of corrupting the code is big. Some kind of user-interface allowing to update source files in a controlled, restricted and convenient way is therefore a necessity.

    • Stronger integration of teams, i.e. the creation of cross-functional teams responsible for a business functionality. This allows for business people to interact much more directly with IT, e.g. the business could also be using JIRA to create business tasks and as such easily assign a ticket to IT for an update of the soft-code.

  • Infrastructure as a Code (IaaC): some soft-code is the result of a need to run the same application in different environments. At that moment, a number of parameters need to be adapted to the environment, such as disabling certain external communication on test environments or adapting URLs, tokens and ports of integrated applications and external partners.
    Obviously this should be handled via soft-code, as nobody wants to maintain multiple versions of your source code depending on the environment. It is important however to manage these environment specific variables fully separately as part of your IaaC code basis. Ideally these are handled via environment variables and secrets. A tool like Kubernetes has a whole setup to correctly manage these types of environment specific configurations in a good way.

  • Soft-code duplication in a "code" and "data" part: instead of handling soft-code purely as "code", you can also duplicate the soft-code in a "code" part and a "data" part. The "code" part is considered as the "default" settings and is fully managed by the IT product team. This ensures that the application can fully work.
    Additionally there is a file or table, which can be adapted by a business owner, to override these defaults. The file/table will only store the overrides and is fully handled as "data".
    This allows a good separation of concerns, but good regression and functional testing remains an issue as the IT product team might not have same setup of the application as in production.

  • Specialized tooling, which allow a business user to manage specific soft-code in an easy and controlled way, while the tooling provides easy methods for IT to deliver the changes in a controlled way to different environments. Examples are feature-flag software (such as Split, Optimizely, A/B Tasty, Adobe Target…​), translation management software (e.g. Transifex, Phrase, Lokalise, Loco, Crowdin…​), CMS software (Wordpress, Dreamweaver, Drupal, Joomla!, Contentful, Contentstack,…​)…​

  • Treat soft-code more as "data": soft-code can also be treated more as data. Some of the concerns can then be resolved by:

    • Regular creation of snapshots (after every change or on a frequent basis), by extracting all soft-code data and automatically storing this snapshot in a source control repository. The source control repository will automatically identify differences compared to the previous snapshot, which the user can document. As such some kind of versioning can be obtained.

    • Easy replication of (scrambled) a production database to the development and test environments

  • Creation of IDE tooling specifically for data: such tooling allows to create data in a user-friendly way (e.g. via syntax highlighting, auto-completion, bulk updates…​), while providing all tooling available to manage code, like versioning (through a source control system), deployment pipelines and debugging (e.g. for AI allow to see immediately on which data your model performs the worst. This can be a sign of bad labelling or the model having insufficient data points) and syntax validation (e.g. add easy syntax rules for structured data representing soft-code).

    • Documentation: develop standard mechanisms to easily document data, with possibilities to extract the full documentation or be automatically alerted when specific documentation should be updated (as the linked data element has been modified)

    • Automatic auditing: automatic storing of a full audit trace (who did which update at which time and from which channel). A tool like Basedash, which allows to interface with a database for support teams to make specialist updates, while automatically auditing all those updates can provide a solution to this.

  • …​

Clearly there is a need for high flexibility in software and business owners to be able to rapidly create changes to existing software. As such we can either make software engineering departments more Agile and faster, or we can hand-over a number of IT tasks to business people in a controlled way. As the need is so high, both trends are happening in parallel. On the IT side, there is the rise of DevOps, Agile, microservices based architecture and cloud native architecture, all allowing IT departments to become more flexible and faster. At business side, all kind of tools help business people to implement business logic and other changes in a controlled way (RPA tools, Low- and No-code platforms, CMS tools, packaged software…​). These tools empower business people to be more independent of IT. Clearly both trends work towards the same goal, but it will be interesting to see which trend takes the upper hand and in which context.

Comments

Popular posts from this blog

Transforming the insurance sector to an Open API Ecosystem

1. Introduction "Open" has recently become a new buzzword in the financial services industry, i.e.   open data, open APIs, Open Banking, Open Insurance …​, but what does this new buzzword really mean? "Open" refers to the capability of companies to expose their services to the outside world, so that   external partners or even competitors   can use these services to bring added value to their customers. This trend is made possible by the technological evolution of   open APIs (Application Programming Interfaces), which are the   digital ports making this communication possible. Together companies, interconnected through open APIs, form a true   API ecosystem , offering best-of-breed customer experience, by combining the digital services offered by multiple companies. In the   technology sector   this evolution has been ongoing for multiple years (think about the travelling sector, allowing you to book any hotel online). An excellent example of this

Are product silos in a bank inevitable?

Silo thinking   is often frowned upon in the industry. It is often a synonym for bureaucratic processes and politics and in almost every article describing the threats of new innovative Fintech players on the banking industry, the strong bank product silos are put forward as one of the main blockages why incumbent banks are not able to (quickly) react to the changing customer expectations. Customers want solutions to their problems   and do not want to be bothered about the internal organisation of their bank. Most banks are however organized by product domain (daily banking, investments and lending) and by customer segmentation (retail banking, private banking, SMEs and corporates). This division is reflected both at business and IT side and almost automatically leads to the creation of silos. It is however difficult to reorganize a bank without creating new silos or introducing other types of issues and inefficiencies. An organization is never ideal and needs to take a number of cons

RPA - The miracle solution for incumbent banks to bridge the automation gap with neo-banks?

Hypes and marketing buzz words are strongly present in the IT landscape. Often these are existing concepts, which have evolved technologically and are then renamed to a new term, as if it were a brand new technology or concept. If you want to understand and assess these new trends, it is important to   reduce the concepts to their essence and compare them with existing technologies , e.g. Integration (middleware) software   ensures that 2 separate applications or components can be integrated in an easy way. Of course, there is a huge evolution in the protocols, volumes of exchanged data, scalability, performance…​, but in essence the problem remains the same. Nonetheless, there have been multiple terms for integration software such as ETL, ESB, EAI, SOA, Service Mesh…​ Data storage software   ensures that data is stored in such a way that data is not lost and that there is some kind guaranteed consistency, maximum availability and scalability, easy retrieval and searching

IoT - Revolution or Evolution in the Financial Services Industry

1. The IoT hype We have all heard about the   "Internet of Things" (IoT)   as this revolutionary new technology, which will radically change our lives. But is it really such a revolution and will it really have an impact on the Financial Services Industry? To refresh our memory, the Internet of Things (IoT) refers to any   object , which is able to   collect data and communicate and share this information (like condition, geolocation…​)   over the internet . This communication will often occur between 2 objects (i.e. not involving any human), which is often referred to as Machine-to-Machine (M2M) communication. Well known examples are home thermostats, home security systems, fitness and health monitors, wearables…​ This all seems futuristic, but   smartphones, tablets and smartwatches   can also be considered as IoT devices. More importantly, beside these futuristic visions of IoT, the smartphone will most likely continue to be the center of the connected devi

PSD3: The Next Phase in Europe’s Payment Services Regulation

With the successful rollout of PSD2, the European Union (EU) continues to advance innovation in the payments domain through the anticipated introduction of the   Payment Services Directive 3 (PSD3) . On June 28, 2023, the European Commission published a draft proposal for PSD3 and the   Payment Services Regulation (PSR) . The finalized versions of this directive and associated regulation are expected to be available by late 2024, although some predictions suggest a more likely timeline of Q2 or Q3 2025. Given that member states are typically granted an 18-month transition period, PSD3 is expected to come into effect sometime in 2026. Notably, the Commission has introduced a regulation (PSR) alongside the PSD3 directive, ensuring more harmonization across member states as regulations are immediately effective and do not require national implementation, unlike directives. PSD3 shares the same objectives as PSD2, i.e.   increasing competition in the payments landscape and enhancing consum

Trade-offs Are Inevitable in Software Delivery - Remember the CAP Theorem

In the world of financial services, the integrity of data systems is fundamentally reliant on   non-functional requirements (NFRs)   such as reliability and security. Despite their importance, NFRs often receive secondary consideration during project scoping, typically being reduced to a generic checklist aimed more at compliance than at genuine functionality. Regrettably, these initial NFRs are seldom met after delivery, which does not usually prevent deployment to production due to the vague and unrealistic nature of the original specifications. This common scenario results in significant end-user frustration as the system does not perform as expected, often being less stable or slower than anticipated. This situation underscores the need for   better education on how to articulate and define NFRs , i.e. demanding only what is truly necessary and feasible within the given budget. Early and transparent discussions can lead to system architecture being tailored more closely to realisti

Low- and No-code platforms - Will IT developers soon be out of a job?

“ The future of coding is no coding at all ” - Chris Wanstrath (CEO at GitHub). Mid May I posted a blog on RPA (Robotic Process Automation -   https://bankloch.blogspot.com/2020/05/rpa-miracle-solution-for-incumbent.html ) on how this technology, promises the world to companies. A very similar story is found with low- and no-code platforms, which also promise that business people, with limited to no knowledge of IT, can create complex business applications. These   platforms originate , just as RPA tools,   from the growing demand for IT developments , while IT cannot keep up with the available capacity. As a result, an enormous gap between IT teams and business demands is created, which is often filled by shadow-IT departments, which extend the IT workforce and create business tools in Excel, Access, WordPress…​ Unfortunately these tools built in shadow-IT departments arrive very soon at their limits, as they don’t support the required non-functional requirements (like high availabili

An overview of 1-year blogging

Last week I published my   60th post   on my blog called   Bankloch   (a reference to "Banking" and my family name). The past year, I have published a blog on a weekly basis, providing my humble personal vision on the topics of Fintech, IT software delivery and mobility. This blogging has mainly been a   personal enrichment , as it forced me to dive deep into a number of different topics, not only in researching for content, but also in trying to identify trends, innovations and patterns into these topics. Furthermore it allowed me to have several very interesting conversations and discussions with passionate colleagues in the financial industry and to get more insights into the wonderful world of blogging and more general of digital marketing, exploring subjects and tools like: Search Engine Optimization (SEO) LinkedIn post optimization Google Search Console Google AdWorks Google Blogger Thinker360 Finextra …​ Clearly it is   not easy to get the necessary attention . With th

Deals as a competitive differentiator in the financial sector

In my blog " Customer acquisition cost: probably the most valuable metric for Fintechs " ( https://bankloch.blogspot.com/2020/06/customer-acquisition-cost-probably-most.html ) I described how a customer acquisition strategy can make or break a Fintech. In the traditional Retail sector, focused on selling different types of products for personal usage to end-customers,   customer acquisition  is just as important. No wonder that the advertisement sector is a multi-billion dollar industry. However in recent years due to the digitalization and consequently the rise of   Digital Marketing , customer acquisition has become much more focused on   delivering the right message via the right channel to the right person on the right time . Big tech players like Google and Facebook are specialized in this kind of targeted marketing, which is a key factor for their success and multi-billion valuations. Their exponential growth in marketing revenues seems however coming to a halt, as digi

AI in Financial Services - A buzzword that is here to stay!

In a few of my most recent blogs I tried to   demystify some of the buzzwords   (like blockchain, Low- and No-Code platforms, RPA…​), which are commonly used in the financial services industry. These buzzwords often entail interesting innovations, but contrary to their promise, they are not silver bullets solving any problem. Another such buzzword is   AI   (or also referred to as Machine Learning, Deep Learning, Enforced Learning…​ - the difference between those terms put aside). Again this term is also seriously hyped, creating unrealistic expectations, but contrary to many other buzzwords, this is something I truly believe will have a much larger impact on the financial services industry than many other buzzwords. This opinion is backed by a study of McKinsey and PWC indicating that 72% of company leaders consider that AI will be the most competitive advantage of the future and that this technology will be the most disruptive force in the decades to come. Deep Learning (= DL) is a s