No-code data platform is a lie
The trade-offs between the different solutions, and graduating to a mature data organization.
We often see data teams considering no-code solutions as a potential option in the early days of the journey. The whole thing sounds pretty nice on paper:
a drag-and-drop UI to build workflows
visual representation of the pipelines
no need to install anything
even business users can build data products
These are very exciting at their core, and quite understandably many teams take these promises for granted. My intention with this article is not to say they are not the right solution for certain problems, but to offer another perspective for growing teams.
To be locked in or not to be locked in
It all starts the same: you have a new platform, it has a beautiful UI, you can do the first steps without writing a single line of code, and you feel like you are making progress. In fact, you are making progress, but with some costs associated with it.
The UI-driven workflow tools will have certain ways of working, primarily driven through their UI, duh, and their own workflows. While clicking around in the UI has its benefits, especially when getting started, there’s a reason these tools ask you to do that.
The workflows you build there will stay with them forever unless you are ready to do the same steps in another tool, which is often not feasible after a certain stage.
The ways of working cannot be extended other than the abilities of the platform, which means as you grow, you start hitting more and more of the rough edges and the lack of flexibility the business requires.
Moving away from these platforms require a significant effort and investment, which means in the best case you are left with a subpar solution, in the worst case the platform can decide to charge you 10x more.
The inability to move away is the one that scared me the most always, let’s dive into that a bit more.
The inability to move away
Businesses change. Team structures change. Individuals change. Problems change. The tools, also, need to change.
Let’s think of the following scenario where we used a no-code data pipeline tool:
Acme Corp, where Laura (Data Architect), Michael (Chief Data Officer), and Nina (Marketing Analyst) work, has a rather simple setup:
They ingest data from Facebook Ads, Google Ads, and Shopify.
They do some data cleaning.
They put them in a report on PowerBI.
Simple enough, nice.
Step 1: Initial implementation
Laura, the Data Architect, was tasked with integrating a new data source, Amazon Marketplace, into their existing no-code data pipeline tool. The tool was already ingesting data from Facebook Ads, Google Ads, and Shopify, with basic data cleaning and reporting set up in PowerBI. Laura began by connecting the Amazon Marketplace data source to the no-code tool. The initial setup seemed straightforward, and the data started flowing into the pipeline.
Step 2: First struggles
As the marketing team ramped up their campaigns, the volume of data from all sources increased significantly. The no-code tool began to struggle with the increased load, leading to slower data processing times and delays in report generation. What used to take 4 minutes to build now takes 3.5 hours to process data every morning, increasing every day.
Step 3: Michael’s business questions
Michael, the Chief Data Officer, requested more complex data transformations to better analyze customer behavior across all platforms. He needed to understand cross-channel attribution, customer lifetime value, and segment performance more accurately to optimize marketing spend and improve customer targeting. These transformations required advanced calculations and custom logic that the no-code tool could not handle efficiently.
To address this, Nina decided to run the complex logic in Python. However, the no-code platform did not support running custom scripts natively. Nina had to collaborate with a software engineer to deploy the Python scripts on AWS Lambda. This solution allowed the custom logic to be executed, but it added an extra layer of complexity and dependency outside the no-code tool.
Step 4: Expanding Data Sources and Integrations
As the business continued to grow, Michael saw the need to integrate additional data sources, including third-party customer data platforms (CDPs) and external market intelligence services. These sources were crucial for gaining deeper insights into customer behavior and staying competitive in the market.
However, the no-code tool struggled to handle the growing number of integrations. It lacked the flexibility to connect seamlessly with the wide variety of APIs and data formats required by these new sources. Laura had to write custom scripts to bridge the gaps, often relying on external services like AWS Lambda and AWS Glue for data transformation and ETL processes.
To complicate matters further, the no-code tool's interface became increasingly cluttered and difficult to manage as more data sources were added. This led to frequent errors and delays in the data pipeline, requiring constant troubleshooting and maintenance.
Step 5: Data Governance
As the company expanded into new markets, Michael needed to ensure that their data processing adhered to strict data governance and compliance requirements, such as GDPR and CCPA. The no-code tool offered limited features for managing data lineage, auditing, and encryption, making it difficult to maintain compliance across multiple jurisdictions.
Laura had to manually implement additional security measures and auditing processes outside the no-code platform, using a combination of AWS services and custom scripts. This made the pipeline even more complex and introduced new points of failure.
Outcome
As the data requirements became more complex, and the need for robust integrations, accurate segmentation, and compliance grew, the limitations of the no-code tool became a significant bottleneck. The team, including Nina, spent more time managing and troubleshooting the pipeline than deriving insights from the data.
The story is far more common than you would think. We have spoken with hundreds of companies, and all of those that adopted a no-code solution for their data infrastructure had also gone through -or were going through- an offboarding process.
Flexibility: yes-code
We build software because we can shape it as we want, hence the “soft” in software. The things we dream of can turn into reality in skillful hands, and building data workloads is not an exception.
Well, except, if you are using a no-code tool.
The no-code platforms are optimized for 80% of the workloads: they take care of the common scenarios in a, supposedly, easy way, and you can build the remaining 20% yourself.
This would work really well if the effort for that 20% scaled in a way where it complements the no-code part. However, in reality, the additional effort to build that custom part requires a significant investment that is pretty much rebuilding the whole thing in itself.
Why do we have to rebuild things again if I have my no-code platform already? Glad you asked.
There are quite a few questions that need to be answered for that custom part of your data workloads:
Where do I run these workloads?
How do I deploy them?
How do I schedule them?
How do I get them to work with the existing parts of the no-code pipelines?
How do I get notified of failures?
How do I retry failures?
So on and so forth…
Without having a clear strategy on how to build and run custom data workloads, i.e. your bespoke Python scripts, building and deploying the first custom stuff outside of the no-code platform means having to figure out all of these questions. This means setting up orchestration from scratch, figuring out a way to ingest data to another location, building a framework around all of these, deploying them somewhere, building notifications, etc.
Governance? What is that?
Data governance, even though it sounds too-enterprisey these days, is a concern that hits every company eventually. It doesn’t have to be a complicated process, data governance concerns start even in very early stages:
What data do I store?
Where do I store that data?
Who can access that data?
Do I comply with the regulations?
How much do I pay for this data?
How often is it being used?
What’s the ROI of a specific piece of data?
The list grows from here on. Some of these questions might be easy to answer, whereas some others are more challenging. In order to be able to answer these questions in an accurate and timely way, you will often need to access the raw source of the data workloads, a.k.a “the code”, and guess who doesn’t have “the code”? The no-code platform.
This simply means the governance questions are answered based on what’s available on the platform, rather than giving you access to the underlying code. Even if you had the engineering ability to analyze the underlying SQL queries yourself, you won’t be able to, since you don’t have the code available to you.
Go for the code, always
In the end, there’s a fundamental disagreement between myself and no-code tools: I think teams should not be deprived of their freedom when it comes to their own data.
The code is the representation fo the real world. It gives you the flexibility to represent all the intricate complexities of your business. There’s never going to be a piece of off-the-shelf software that can cover every complexity of your business, which means you might as well start the right way: go for the code.
This is even more significant when it comes to your data workloads: data grows faster than the software stack, which means the complexities and the processing of the data will grow faster as well, which means you need that freedom more with your data workloads than ever.
That’s why we are building Bruin, it is the platform that I wished always existed: just let me write the code for my business logic and take care of the rest. In practice, this means you write your data transformation code, you version-control it, you add quality checks, everything is in code, and the platform takes care of the generic problems such as running them at the right time and in the right order, reporting executions, notifying you on failures, and more.
There are already a lot of teams that have been bitten by the pains of the no-code tools, and I am very excited for a future where this doesn’t happen anymore.