Are data engineers secretly doing two jobs at once?
When most companies first build a data team, there’s just one role: data engineer. But here’s the truth - that role has always been two jobs rolled into one.
Companies capture more data than ever, yet based on experience, somewhere between 60 to 80% goes unused. How can we make use of that data and get it into the hands of the right people at the right time? The solution seems obvious - hire a data engineer. The job ad goes up for a ‘data engineer’. The new hire can build robust data flows and tune infrastructure like a race car mechanic. But when a stakeholder requests a dashboard, they’re lost. Sound familiar? Or is your situation the other way around?
In the 2000s, we had the humble role of business intelligence (BI) developer. Business Intelligence was a fancy term that really meant ‘analytics and insights’ but sounded mysterious and cool. I couldn’t resist either and used it myself to refer to the work we did.
A BI developer was tasked with crafting data from a source system into something that could be used to understand the business, answer questions and make decisions. This generally got built in applications or in a data warehouse. BI developers did it all - extract data from source systems, transform it into something useful, load it into a data warehouse, then build reports and dashboards to share with the stakeholder. They were the full-stack data professionals of their time.
This role was often in the IT department and often quite removed from the business domain - meaning lots of back and forth between the BI developer and stakeholder to understand the data and to align on the final output.
While it was one role, it had five completely different skill sets:
data detective (what does this actually mean?)
pipeline engineer (how do we load it?)
data modeller (how should we structure it?)
governance analyst (can we trust it?)
storyteller (what does it tell us?)
Fast forward to 2011 and we started having large tech companies, such as Google and Facebook, advertise roles for a ‘data engineer’. This role was eerily similar to a BI developer but sounded much cooler (yes, again). In a simplified sense, it was tasked with moving data from a to b. While it sounded simple, the complexity required to understand both the content of the data as well as the computing resources needed to move and transform the data gave the role a large scope of work. This breadth naturally had many overlaps with existing roles. Suddenly everyone wanted this title. Data modellers, BI analysts, ETL developers - all became ‘data engineers’.
Here’s where it gets interesting... as all these data engineers worked away, we discovered that understanding customer behaviour to build a churn model requires completely different skills than optimising database partitions for performance. One is about business logic, the other is about infrastructure. But we kept hiring for both under one role - data engineer.
Stakeholders put in requests to the data team to build the data asset and associated report or insights. However, due to the mix of skills as well as the domain expertise needed, it became a huge time sink. I was at one organisation where the data team had a 14 month backlog! But when I asked if I (in a different team) could do the work for them, I was met with the reply “no, only we can work on the data tools”.
The chasm was increased as data was declared ‘the new oil’ and became necessary in processes and decisions across the business. Data couldn’t be restricted to IT - it was needed in Sales, Finance, Marketing and Customer Service to drive decisions and personalise experiences.
By the time we hit 2020, the ‘data engineer’ term was locked in. The term may have only been 9 years old but resumes were popping up with 15 years experience as a ‘data engineer’.
So now we have a two-fold problem:
two distinct kinds of work
one popular job title
So how do we untangle this mess? We can think of it as a continuum - from pure data work on one end to pure engineering work on the other:
Image: The data-engineering continuum
While these four roles exist on a continuum with blurred lines - especially at smaller companies where people wear multiple hats - they give us a useful framework for thinking about the work:
Data Analyst
Investigates business questions through data exploration such as "Why did sales drop last month?" or "Which customers are most likely to churn?".
Analytics Engineer
Builds reliable data models that multiple teams can trust. Figures out why sales figures don't match between systems.
Data Platform Engineer
Designs the infrastructure that moves and stores data at scale, enabling both speed and safety.
Software Engineer
Builds the APIs, tools and services that make data accessible to applications and end users.
Every company is now a data company, and as data and AI play bigger roles, we need language that matches reality. Expecting a full-stack ‘data engineer’ is madness. It’s like expecting one person to be both the bar technician who maintains the kegs and taps AND the bartender who serves customers. Both work with the same liquid - but one focuses on pressure, flow and keeping the system running - while the other focuses on what the customer actually wants and how to serve it properly.
The data analyst and software engineer roles have been around for a while and are well defined. However, it’s clear to see that what we have been calling ‘data engineer’ is really two roles in disguise:
Analytics Engineer
Data Platform Engineer
The cost of this confusion isn’t just semantic. When you hire a data pipeline expert to build business logic, you get brittle data models. When you hire a data modelling expert to scale infrastructure, you get performance bottlenecks. In larger organisations, it’s time we stopped asking one person to be both bartender and bar technician. The next time you’re hiring a data engineer, ask yourself: do you need someone to build the pipes or someone to serve the data flowing through them?


