<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:itunes="http://www.itunes.com/dtds/podcast-1.0.dtd" xmlns:googleplay="http://www.google.com/schemas/play-podcasts/1.0"><channel><title><![CDATA[Stats Life]]></title><description><![CDATA[Reflections on data, leadership and engineering through the probabilities that shape them]]></description><link>https://blog.statslife.com</link><image><url>https://substackcdn.com/image/fetch/$s_!sKQd!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ba673d9-b656-487c-8ecb-654ffd354cc9_285x285.png</url><title>Stats Life</title><link>https://blog.statslife.com</link></image><generator>Substack</generator><lastBuildDate>Wed, 15 Apr 2026 20:53:51 GMT</lastBuildDate><atom:link href="https://blog.statslife.com/feed" rel="self" type="application/rss+xml"/><copyright><![CDATA[Adric Streatfeild]]></copyright><language><![CDATA[en]]></language><webMaster><![CDATA[statslife@substack.com]]></webMaster><itunes:owner><itunes:email><![CDATA[statslife@substack.com]]></itunes:email><itunes:name><![CDATA[Adric Streatfeild]]></itunes:name></itunes:owner><itunes:author><![CDATA[Adric Streatfeild]]></itunes:author><googleplay:owner><![CDATA[statslife@substack.com]]></googleplay:owner><googleplay:email><![CDATA[statslife@substack.com]]></googleplay:email><googleplay:author><![CDATA[Adric Streatfeild]]></googleplay:author><itunes:block><![CDATA[Yes]]></itunes:block><item><title><![CDATA[Why your data mesh is failing]]></title><description><![CDATA[It isn&#8217;t a Snowflake or Databricks problem. It&#8217;s a photography problem.]]></description><link>https://blog.statslife.com/p/why-your-data-mesh-is-failing</link><guid isPermaLink="false">https://blog.statslife.com/p/why-your-data-mesh-is-failing</guid><dc:creator><![CDATA[Adric Streatfeild]]></dc:creator><pubDate>Sun, 01 Mar 2026 07:25:35 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!sKQd!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ba673d9-b656-487c-8ecb-654ffd354cc9_285x285.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The average corporate data mesh is a camera roll of blurry iPhone shots with a thumb on the lens. We&#8217;ve sold the industry a beautiful lie: that decentralising data and giving every domain team the tools to build their own data products will magically create value. But we forgot one crucial detail. Giving someone a camera doesn&#8217;t make them a photographer.</p><div class="preformatted-block" data-component-name="PreformattedTextBlockToDOM"><label class="hide-text" contenteditable="false">Text within this block will maintain its original spacing when published</label><pre class="text"> </pre></div><p>I&#8217;ve spent three years in the trenches of data mesh implementation. The lesson that took me far too long to learn: this is a photography problem, not a platform problem. Data mesh doesn&#8217;t fail because of technology. It fails because teams confuse taking snapshots with owning photographs. A data mesh works when three things happen:</p><ul><li><p>Perspective widens</p></li><li><p>Ownership becomes explicit</p></li><li><p>Teams resist the urge to photograph whatever is in front of them</p></li></ul><div class="preformatted-block" data-component-name="PreformattedTextBlockToDOM"><label class="hide-text" contenteditable="false">Text within this block will maintain its original spacing when published</label><pre class="text"> </pre></div><p>Everyone can take a photo</p><p>I am chronically guilty of &#8216;spray and pray&#8217; photography. When someone hands me a phone to take their picture, I mash the digital shutter button like a video game champion. I give zero thought to the lighting, the background or, if I&#8217;m being honest, sometimes even the people in the shot themselves. All we end up with are 20 identical, useless shots clogging up the photo gallery. None are the trusted memory being sought. In the data world, we do the exact same thing. I&#8217;ve watched a team rebuild an entire data product from scratch - reimplementing months of gnarly business logic - because they needed one extra column and didn&#8217;t trust the team that owned the original. In both cases, we created plenty of data but zero trusted memories.</p><p>The contrast became obvious when I hired a professional photographer for my wedding. I watched them work and learned that they didn&#8217;t mash buttons but intentionally used lighting, aperture and framing. They knew exactly what they wanted to capture and - more importantly - what they wanted to leave out. Building a data product requires the exact same discipline. A professional knows that a great photo is a deliberate, narrow slice of reality.</p><div class="preformatted-block" data-component-name="PreformattedTextBlockToDOM"><label class="hide-text" contenteditable="false">Text within this block will maintain its original spacing when published</label><pre class="text"> </pre></div><p>Every photo is a slice</p><p>A photograph is narrow by design. It captures one subject from one perspective. That&#8217;s the entire point. A data product should work the same way. Think of your smartphone&#8217;s &#8216;panorama&#8217; feature where you take multiple overlapping shots to create a wider landscape. Have you ever actually taken a good one? </p><p>The same is true of the panoramic data product. Someone tries to capture everything - whether data or scenery - and ends up with a warped monstrosity with someone&#8217;s disembodied arm floating in the middle. The lesson? Split data products by subject area. Multi-domain data products are corporate panoramas. Crop ruthlessly.</p><div class="preformatted-block" data-component-name="PreformattedTextBlockToDOM"><label class="hide-text" contenteditable="false">Text within this block will maintain its original spacing when published</label><pre class="text"> </pre></div><p>Which photo goes into the album?</p><p>We have all been held hostage in that meeting. The one where three different leaders proudly present three completely different numbers for &#8216;active customers&#8217;. Instead of a strategic discussion about business value, it devolves into a 55 minute &#8216;robust discussion&#8217; over what the word &#8216;active&#8217; means.</p><p>That meeting doesn&#8217;t derail because you chose Snowflake over Databricks. It derails because your organisation confused the ability to snap a photo with the discipline of curating a gallery.</p><p>What you witnessed in that room is simply two teams who photographed the exact same scene from slightly different angles. Both datasets are technically &#8220;correct&#8221; in their own context. Neither is authoritative. To solve this, you need a person or process with the authority to decide which shot actually makes it into the company album. That&#8217;s governance, not bureaucracy. But governance only cleans up the mess after the fact. The better question is: what if teams shot with intention from the start?</p><div class="preformatted-block" data-component-name="PreformattedTextBlockToDOM"><label class="hide-text" contenteditable="false">Text within this block will maintain its original spacing when published</label><pre class="text"> </pre></div><p>Shoot with the end in mind</p><p>A great photographer doesn&#8217;t just take the shot - they understand their audience. Who is this photo&nbsp;for? What story does it tell&nbsp;them? People take photos at concerts just to prove they were there, not to create a masterpiece. Similarly, engineers build data products just to check a box, taking the path of least resistance. </p><p>Nobody wants to spend weeks building a data product, only to burn the final Friday manually typing out metadata into a dusty Confluence page. Neither do you pull out your phone at a concert and want to click through endless settings before taking the shot.</p><p>People will always choose auto mode. Every time. Our job isn&#8217;t to shame people into manual mode. It&#8217;s to make auto mode good enough. We must code the rules so the path of least resistance is also the path of compliance.</p><div class="preformatted-block" data-component-name="PreformattedTextBlockToDOM"><label class="hide-text" contenteditable="false">Text within this block will maintain its original spacing when published</label><pre class="text"> </pre></div><p>The EXIF data test</p><p>If you&#8217;re wondering whether your organisation has photographers or just people with cameras, here&#8217;s a quick exposure test:</p><ul><li><p>Can a new team discover data products without a meeting?</p></li><li><p>Do data products exist with clear owners?</p></li><li><p>Are duplicates intentional and labelled as so?</p></li><li><p>Is there accountability for how it all fits together? (aka the big picture)</p></li></ul><div class="preformatted-block" data-component-name="PreformattedTextBlockToDOM"><label class="hide-text" contenteditable="false">Text within this block will maintain its original spacing when published</label><pre class="text"> </pre></div><p>Say cheese</p><p>Data mesh isn&#8217;t a technical or architectural problem - it&#8217;s a&nbsp;perspective&nbsp;problem. Like photography, everyone thinks they can do it because they interact with data every day. But it fails because we confuse the ability to snap a photo with the discipline of curating a gallery. The real work of data mesh is widening people&#8217;s aperture - getting them to see beyond their narrow frame of reference.</p><p>Widening the aperture is uncomfortable. It means caring about things outside your frame. It means accepting that your &#8220;photo&#8221; exists in a gallery alongside others - and that the gallery needs curation. For most domain teams, this means caring about someone else&#8217;s problem. That&#8217;s the real ask of data mesh. Not better tooling. Not more autonomy. More perspective.</p><p>Are you building a data mesh with skilled photographers... or are you just handing out disposable cameras and hoping for a masterpiece?</p>]]></content:encoded></item><item><title><![CDATA[Have you got zombieware running at your company?]]></title><description><![CDATA[The undead systems quietly draining your budget - and how to kill them]]></description><link>https://blog.statslife.com/p/have-you-got-zombieware-running-at</link><guid isPermaLink="false">https://blog.statslife.com/p/have-you-got-zombieware-running-at</guid><dc:creator><![CDATA[Adric Streatfeild]]></dc:creator><pubDate>Sat, 31 Jan 2026 08:39:01 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!sKQd!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ba673d9-b656-487c-8ecb-654ffd354cc9_285x285.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>There&#8217;s a silent killer in your tech stack. It&#8217;s not tech debt. It&#8217;s not an undiscovered bug. It&#8217;s the project you launched six months ago. Yes, the one that you had the party to celebrate it going live. It&#8217;s still running, consuming resources but being used by no one. It&#8217;s technically alive, but it&#8217;s dead to the business. You&#8217;ve created zombieware.</p><p></p><p><strong>My First Encounter</strong> </p><p>I worked in one place as a consultant and built a pipeline and dashboard - it was beautiful, a work of art that Da Vinci would have been proud of. We deployed it, everyone high fived and then I left. A year later, I caught up with the team and enquired how it was. They said they had stopped using it and went back to Excel. I asked what they did with the dashboard, someone piped up &#8220;It&#8217;s still there running but no-one uses it&#8221;. It was technically alive but practically dead to the business. I realised that I had created a zombie. </p><p></p><p><strong>Anatomy of a Zombie</strong></p><p>Zombieware isn&#8217;t abandonware rotting in a repo. It&#8217;s the technically-alive-but-business-dead software actively draining your resources while everyone pretends it doesn&#8217;t exist. And the only cure? Stop celebrating builders and start demanding owners.</p><blockquote><p>&#8220;Zombieware&#8221; - software that is technically alive because it consumes resources but is dead to the business because it delivers zero business value </p></blockquote><p>Zombieware is software that is still alive and running - consuming resources and attention. This in itself is not necessarily a bad thing, however, it creates the illusion of value as it devours resources on your server or cloud bill.</p><p>What makes it lethal is that it serves no living purpose. There is no business value for the effort to design, create, deploy and maintain the software. This means that it is:</p><ul><li><p>technically functional and consuming resources (think: CPU, storage, licences)</p></li><li><p>it has no active users or its outputs are ignored</p></li><li><p>it has no clear owner responsible for its lifecycle</p></li></ul><p>To be clear, this is not some passive form of tech debt. It&#8217;s an active, resource-devouring plague on your organisation&#8217;s focus, resources and profit margins. The only cure is to shift from a culture that celebrates building to one that demands ownership and return on investment.</p><p></p><p><strong>The Silent Tax on Everything</strong></p><p>Zombieware isn&#8217;t some passive liability, it&#8217;s a tax on everything you do. It&#8217;s paid in compute cycles, cognitive load and opportunity cost. It&#8217;s bleeding you dry and no one - not even Alex from Finance - ever approved the expense.</p><p>This tax shows up in insidious ways. Consider two exhibits from my own chamber of horrors.</p><p>First, the technical drain. We built an internal API to handle customer verification. It&nbsp;<em>just worked</em> - a model of reliability. So reliable, in fact, that when we shipped v2 and moved on, nobody remembered to kill v1. Two years later, we stumbled upon it, faithfully running in production, ready to serve requests that were never made. The direct&nbsp;resource cost&nbsp;for its dedicated compute and storage was bad enough. The real surprise? The&nbsp;indirect cost&nbsp;of shipping and storing of logs no one was reading was even higher. That&#8217;s before you account for the&nbsp;security risk&nbsp;of an unpatched, two-year-old service and the&nbsp;cognitive overhead&nbsp;it placed on any engineer trying to understand our architecture.</p><p>But the most blood-draining costs aren&#8217;t on your cloud bill, they&#8217;re inside your head. We spent a whole quarter building a complex ETL pipeline. It pulled data from three tricky systems, cleaned and dutifully loaded it into our data warehouse every night. It ran flawlessly. Six months later, a curious junior engineer asked which dashboards were powered by those tables. The answer? None. The business stakeholder who requested the data had changed roles and their replacement had no idea it existed. The&nbsp;build cost&nbsp;was immense, the ongoing&nbsp;compute cost&nbsp;was in the thousands daily and a total waste, but the real damage was to&nbsp;morale. Nothing kills a builder&#8217;s spirit faster than seeing their work become pointless.</p><p>These aren&#8217;t isolated incidents. They are symptoms of a culture that celebrates the launch over the lifecycle. Let&#8217;s be honest: building is the glamour work. Shipping new things gets you promoted. Maintenance is the work nobody sees. The creates an environment that breeds zombieware.</p><p>You know how hard it can be to get money to budget or for spending approval. However, zombieware is a cost to your organisation that no one approved. It&#8217;s paid in compute cycles, in maintenance overhead, in cognitive load of engineers and in the opportunity cost of what you could be building instead. It&#8217;s bleeding you dry, one server instance and one artefact at a time.</p><p></p><p><strong>The Cure: how to prevent and remove zombies</strong></p><p>The first step is instrumentation. </p><p>Without the telemetry of what exists and how it&#8217;s being used, we are flying blind against the zombie invasion. As we&#8217;ve seen in the films, this doesn&#8217;t end well.</p><p>The second step is clear ownership. </p><p>Every new project, from a single script to a microservice, must have a named owner before a single line of code is written. This sounds insultingly simple. It is. Often, it&#8217;s the lack of accountability that drives zombies. Ownership isn&#8217;t about building - it&#8217;s about the entire lifecycle, including decommissioning. </p><p>The third and final step is to set a kill switch.</p><p>Teams often measure technical metrics like CPU utilisation and service latency. These are important but only half the story. Define metrics up front to measure minimum acceptable usage. </p><ul><li><p>How do we define usage? e.g. opening a page, clicking a button, making a decision, resource cost</p></li><li><p>What time period makes sense? I&#8217;ve found that 14, 30 or 90 days are good starting points depending on the situation </p></li><li><p>Add the review of the metric to an existing process. If the minimum acceptable usage isn&#8217;t hit, undeploy the software</p></li></ul><p>Sound extreme? It&#8217;s less extreme than paying for servers to run code that delivers precisely zero value. You already do this with feature flags and A/B tests, you&#8217;re just applying the same &#8216;prove your value or die&#8217; logic to the whole software lifecycle.</p><p></p><p><strong>Advanced Zombie Hunting</strong></p><p>Those three steps will help but zombies are notoriously difficult to kill. It&#8217;s often easier to turn a blind eye to the zombie feasting on your cloud bill than rally the team to do something about it. Here are a few &#8216;gotchas&#8217; that can help stop you from being caught out.</p><p>Related systems</p><p>It&#8217;s not always so simple when many software provide or read information from other systems. If our system is a building block for a downstream system that <em><strong>is</strong> </em>important, then we need to consider if everyone is using the same approach. Otherwise, we risk continuing to run our service to feed a downstream system which has no real usage - meaning that our service, despite it&#8217;s appearance, is actually zombieware.</p><p>Run a Zombie Purge</p><p>While the kill switch is great, sometimes we see metrics that are too easy or not measuring the real problem. To solve this, once a year, ideally near October 31st, have the team dress up in Halloween costumes. Spend the day reviewing the usage of <strong>all</strong> software and doing a purge of any zombieware identified. Every resource or process should be investigated and tagged as either &#8216;Ok&#8217; or &#8216;Zombie&#8217;. Think of it as a &#8216;trick or treat&#8217; and reward zombie kills with lollies (candy) or whatever motivates the people at your company.</p><ul><li><p>Ask a simple question for every system: &#8220;<em>Who uses this and what decision did they make with it last week/month?</em>&#8221;</p></li><li><p>Use monitoring tools to find things with zero usage</p></li><li><p>Create a &#8220;Zombie of the Year&#8221; award as a fun way to celebrate decommissioning and simplifying the stack</p></li></ul><p>Organ donors</p><p>When it comes time to kill a zombie, ensure that you harvest any organs that might be useful. These could be features or components you could reuse or queries that add value. The simplest method we&#8217;ve seen here is store them in a shared &#8216;Organ bank&#8217; repository.</p><p></p><p><strong>The final stake</strong></p><p>Look at the last three things you built. Do you know who uses them today? Do you know when they should die? Or did you just launch them and walk away?</p><p>True senior engineering isn&#8217;t just about building complex things. It&#8217;s about having the wisdom and discipline to kill them. Stop measuring your worth by what you launch and start measuring it by what you have the courage to kill.</p>]]></content:encoded></item><item><title><![CDATA[Data scientists won't exist in 5 years]]></title><description><![CDATA[The quiet unbundling of the sexiest job of the 21st century]]></description><link>https://blog.statslife.com/p/data-scientists-wont-exist-in-5-years</link><guid isPermaLink="false">https://blog.statslife.com/p/data-scientists-wont-exist-in-5-years</guid><dc:creator><![CDATA[Adric Streatfeild]]></dc:creator><pubDate>Tue, 28 Oct 2025 10:35:18 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!sKQd!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ba673d9-b656-487c-8ecb-654ffd354cc9_285x285.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>The job title &#8220;Data Scientist&#8221; will be extinct in five years. This isn&#8217;t another doomsday prediction about the rise of AI. This is about a market correction that&#8217;s already happening in the consolidation of jobs that don&#8217;t have a clear focus. We spoke about the role of a <a href="https://blog.statslife.com/p/are-data-engineers-secretly-doing-two-roles">data engineer being overloaded as two jobs</a> but now let&#8217;s discuss a role that was so ambiguous it was setup to fail - one that makes us wonder, was a data scientist ever a role at all?</p><p></p><p><strong>The &#8220;sexiest job of the 21st century&#8221; is a lie</strong></p><p>Back in 2012, <a href="https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century">an article came out about the role of a &#8216;data scientist&#8217;</a> - someone who &#8220;<em>has the training and curiosity to make discoveries in the world of big data</em>&#8221;. It was noted that this skillset was rare and decreed the role as the &#8220;sexiest job of the 21st century&#8221;. As people flocked to this career path, the role was declared a &#8220;unicorn&#8221; and a dreaded Venn diagram was created to capture the overlap of the three key skills:</p><ul><li><p>software engineering</p></li><li><p>statistics</p></li><li><p>business acumen</p></li></ul><p>The role took off because it was something different and fun. Who doesn&#8217;t love to predict the future? Everyone wanted to believe that data scientists were the mythical hero who could do it all. However, data in the corporate world was their kryptonite - while in theory it was clean like iris flower data - the reality was more like overgrown weeds and fallen trees. This proved to be a stumbling block to getting started but the enthusiasm to hire data scientists didn&#8217;t slow. Yet over a decade later, why do <a href="https://business.uq.edu.au/momentum/why-80-per-cent-ai-projects-fail">80% of data science projects fail?</a></p><p></p><p><strong>The problem isn&#8217;t the data scientist, it&#8217;s the role - aka it&#8217;s not you, it&#8217;s me</strong></p><p>The answer isn&#8217;t the practitioner - it&#8217;s the job description. The role itself is a paradox. What started out as the universal skill for a data scientist was &#8220;the ability to write code&#8221; but it quickly became a catch-all term for three distinct functions.</p><p> 1.  The data cleaner</p><p>&#9;&nbsp;The bulk of the work is wrangling data into a usable format. </p><p>&#9;&nbsp;This is the job of an&nbsp;Analytics Engineer.</p><p>2. The pipeline builder</p><p>&#9;The task of getting data to a model and getting predictions out. </p><p>&#9;This is the job of a&nbsp;Data Platform Engineer or ML Engineer. </p><p>3. The storyteller</p><p>&#9;Being able to explain the insights and link it to the business problem or need. </p><p>&#9;This is the job of a Data Analyst. </p><p>Notice what&#8217;s missing? &#8220;science&#8221;...</p><p><br><strong>The science that was really just vibes</strong></p><p>At this point, you might be thinking &#8220;But what about the ML models? Didn&#8217;t the data scientists create those?&#8221;. And you&#8217;d be right. However, while that was the glamorous side of the job, it generally accounted for just 10-20% of a data scientist&#8217;s time. </p><p>When you unpack the &#8216;science&#8217; further, it starts to unravel more:</p><ul><li><p><strong>Algorithm Choice:</strong>&nbsp;This isn&#8217;t a mystical quest. It&#8217;s a dependency choice, just like a software engineer choosing between React and Vue (or the other thousand Javascript packages). Why do people spend three weeks testing Random Forest vs XGBoost for a 2% accuracy bump that translates to zero business impact?</p></li><li><p><strong>Data Drift:</strong>&nbsp;This is portrayed as a novel scientific challenge but it&#8217;s just a data quality and monitoring problem. Analytics Engineers have been handling schema changes and slowly changing dimensions for years.</p></li><li><p><strong>Build vs Buy:</strong>&nbsp;The decision to build a custom model is not a scientific one. It&#8217;s a classic engineering build vs buy analysis. For most use cases, it doesn&#8217;t make sense to build your own, since off-the-shelf models or AutoML solutions will usually suffice. However, if a 0.01 uplift in AUC translates to a $1M return, then the investment and ongoing maintenance might be justified.</p></li></ul><p>All that&#8217;s left is a process driven more by intuition than evidence. What we called data science was often just trial and error in a Jupyter notebook - vibe coding at a PhD price.</p><p></p><p><strong>Will the real model owner please stand up?</strong></p><p>Unlike traditional engineering which was delivering in small increments, data science often involved large upfront time investment called discovery or experimentation. This created friction between executives keen for results and the data scientist grinding through experiments. </p><p>Things went sideways as the data scientist become too technical-focused on the &#8216;science&#8217; rather than the actual business problem. Disconnected from the business outcome, it became a negative feedback loop and more time was spent tuning the the model accuracy rather than validating the outcomes.</p><p>Thus, the ownership of the model became unclear as the business person asking for the help didn&#8217;t understand how the 5 layer neural network predicted customer A would churn. Neither did the data scientist understand the impact of a customer leaving the business. This left the model running in production but unused. And this was a good case scenario.</p><p>Often the model hardly ever made production due to hand-offs. At one place I worked, the data scientist wrote a model in python. However, the ML framework was written in Java. So an engineer had to rewrite the model in Java that the data scientist had created in Python. This is crazy! Imagine McDonalds creating two hamburgers for the one you ordered. It&#8217;s no wonder the executives were wondering why ML initiatives took so long with situations like these.</p><p></p><p><strong>The Data Science Unbundling: what comes next</strong></p><p>Data scientists won&#8217;t exist in five years - not because of AI - but because the work splits cleanly into two real jobs: product analytics and engineering. The rest is a vibes-based middle where notebooks flourish, models drift and nobody owns outcomes.</p><p>The roles that actually matter:</p><ul><li><p>Data Analysts:&nbsp;Own the understanding and explanation of data</p></li></ul><ul><li><p>Analytics Engineers:&nbsp;Own the semantic layer and deliver trusted, clean data</p></li><li><p>ML Engineers:&nbsp;Own the end-to-end process of deploying, scaling, and monitoring models as a product</p></li><li><p>(Optional) Research Scientists:&nbsp;The&nbsp;<em>true</em> scientists, reserved for the 1% of novel problems where off-the-shelf solutions don&#8217;t exist</p></li></ul><p>We&#8217;re not saying that the people who were Data Scientists aren&#8217;t valuable. Quite the opposite. This unbundling is actually a promotion - where there is increased focus, accountability and, most importantly, impact.</p><p><strong>If you&#8217;re a Data Scientist:</strong>&nbsp;Audit your last three months. Where did you create tangible value? Double down on that - whether it&#8217;s engineering or analytics - and rebrand yourself.</p><p><strong>If you&#8217;re a leader:</strong>&nbsp;Stop writing unicorn job descriptions. Redefine your roles for clarity and ownership. Your team&#8217;s velocity and the business&#8217;s profit margins will thank you.</p><p>The &#8220;Data Scientist&#8221; title is a failed experiment born from a decade of hype. While the job title joins Data Entry Clerks gathering dust, the future belongs to those who make data impactful.</p>]]></content:encoded></item><item><title><![CDATA[Are data engineers secretly doing two jobs at once?]]></title><description><![CDATA[When most companies first build a data team, there&#8217;s just one role: data engineer. But here&#8217;s the truth - that role has always been two jobs rolled into one.]]></description><link>https://blog.statslife.com/p/are-data-engineers-secretly-doing-two-roles</link><guid isPermaLink="false">https://blog.statslife.com/p/are-data-engineers-secretly-doing-two-roles</guid><dc:creator><![CDATA[Adric Streatfeild]]></dc:creator><pubDate>Sat, 04 Oct 2025 06:06:27 GMT</pubDate><enclosure url="https://substackcdn.com/image/fetch/$s_!sKQd!,w_256,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F4ba673d9-b656-487c-8ecb-654ffd354cc9_285x285.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Companies capture more data than ever, yet based on experience, somewhere between 60 to 80% goes unused. How can we make use of that data and get it into the hands of the right people at the right time? The solution seems obvious - hire a data engineer. The job ad goes up for a &#8216;data engineer&#8217;. The new hire can build robust data flows and tune infrastructure like a race car mechanic. But when a stakeholder requests a dashboard, they&#8217;re lost. Sound familiar? Or is your situation the other way around?</p><p>In the 2000s, we had the humble role of business intelligence (BI) developer. Business Intelligence was a fancy term that really meant &#8216;analytics and insights&#8217; but sounded mysterious and cool. I couldn&#8217;t resist either and used it myself to refer to the work we did. </p><p>A BI developer was tasked with crafting data from a source system into something that could be used to understand the business, answer questions and make decisions. This generally got built in applications or in a data warehouse. BI developers did it all - extract data from source systems, transform it into something useful, load it into a data warehouse, then build reports and dashboards to share with the stakeholder. They were the full-stack data professionals of their time.</p><p>This role was often in the IT department and often quite removed from the business domain - meaning lots of back and forth between the BI developer and stakeholder to understand the data and to align on the final output.</p><p>While it was one role, it had five completely different skill sets:</p><ul><li><p>data detective (what does this actually mean?)</p></li><li><p>pipeline engineer (how do we load it?)</p></li><li><p>data modeller (how should we structure it?)</p></li><li><p>governance analyst (can we trust it?)</p></li><li><p>storyteller (what does it tell us?)</p></li></ul><p>Fast forward to 2011 and we started having large tech companies, such as Google and Facebook, advertise roles for a &#8216;data engineer&#8217;. This role was eerily similar to a BI developer but sounded much cooler (yes, again). In a simplified sense, it was tasked with moving data from a to b. While it sounded simple, the complexity required to understand both the content of the data as well as the computing resources needed to move and transform the data gave the role a large scope of work. This breadth naturally had many overlaps with existing roles. Suddenly everyone wanted this title. Data modellers, BI analysts, ETL developers - all became &#8216;data engineers&#8217;.  </p><p>Here&#8217;s where it gets interesting... as all these data engineers worked away, we discovered that understanding customer behaviour to build a churn model requires completely different skills than optimising database partitions for performance. One is about business logic, the other is about infrastructure. But we kept hiring for both under one role - data engineer.</p><p>Stakeholders put in requests to the data team to build the data asset and associated report or insights. However, due to the mix of skills as well as the domain expertise needed, it became a huge time sink. I was at one organisation where the data team had a 14 month backlog! But when I asked if I (in a different team) could do the work for them, I was met with the reply &#8220;no, only we can work on the data tools&#8221;.</p><p>The chasm was increased as data was declared &#8216;the new oil&#8217; and became necessary in processes and decisions across the business. Data couldn&#8217;t be restricted to IT - it was needed in Sales, Finance, Marketing and Customer Service to drive decisions and personalise experiences.</p><p>By the time we hit 2020, the &#8216;data engineer&#8217; term was locked in. The term may have only been 9 years old but resumes were popping up with 15 years experience as a &#8216;data engineer&#8217;.</p><p>So now we have a two-fold problem:</p><ul><li><p>two distinct kinds of work</p></li><li><p>one popular job title</p></li></ul><p>So how do we untangle this mess? We can think of it as a continuum - from pure data work on one end to pure engineering work on the other:</p><div class="captioned-image-container"><figure><a class="image-link image2" target="_blank" href="https://substackcdn.com/image/fetch/$s_!Fhw0!,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png" data-component-name="Image2ToDOM"><div class="image2-inset"><picture><source type="image/webp" srcset="https://substackcdn.com/image/fetch/$s_!Fhw0!,w_424,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png 424w, https://substackcdn.com/image/fetch/$s_!Fhw0!,w_848,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png 848w, https://substackcdn.com/image/fetch/$s_!Fhw0!,w_1272,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png 1272w, https://substackcdn.com/image/fetch/$s_!Fhw0!,w_1456,c_limit,f_webp,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png 1456w" sizes="100vw"><img src="https://substackcdn.com/image/fetch/$s_!Fhw0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png" width="1456" height="268" data-attrs="{&quot;src&quot;:&quot;https://substack-post-media.s3.amazonaws.com/public/images/3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png&quot;,&quot;srcNoWatermark&quot;:null,&quot;fullscreen&quot;:null,&quot;imageSize&quot;:null,&quot;height&quot;:268,&quot;width&quot;:1456,&quot;resizeWidth&quot;:null,&quot;bytes&quot;:160475,&quot;alt&quot;:null,&quot;title&quot;:null,&quot;type&quot;:&quot;image/png&quot;,&quot;href&quot;:null,&quot;belowTheFold&quot;:true,&quot;topImage&quot;:false,&quot;internalRedirect&quot;:&quot;https://statslife.substack.com/i/174146054?img=https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png&quot;,&quot;isProcessing&quot;:false,&quot;align&quot;:null,&quot;offset&quot;:false}" class="sizing-normal" alt="" srcset="https://substackcdn.com/image/fetch/$s_!Fhw0!,w_424,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png 424w, https://substackcdn.com/image/fetch/$s_!Fhw0!,w_848,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png 848w, https://substackcdn.com/image/fetch/$s_!Fhw0!,w_1272,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png 1272w, https://substackcdn.com/image/fetch/$s_!Fhw0!,w_1456,c_limit,f_auto,q_auto:good,fl_progressive:steep/https%3A%2F%2Fsubstack-post-media.s3.amazonaws.com%2Fpublic%2Fimages%2F3962c254-26ca-4488-9e08-a485347ed58f_3692x680.png 1456w" sizes="100vw" loading="lazy"></picture><div></div></div></a></figure></div><p><em>Image: The data-engineering continuum</em></p><p>While these four roles exist on a continuum with blurred lines - especially at smaller companies where people wear multiple hats - they give us a useful framework for thinking about the work:</p><p><strong>Data Analyst</strong><br>Investigates business questions through data exploration such as "Why did sales drop last month?" or "Which customers are most likely to churn?".</p><p><strong>Analytics Engineer</strong><br>Builds reliable data models that multiple teams can trust. Figures out why sales figures don't match between systems.</p><p><strong>Data Platform Engineer</strong><br>Designs the infrastructure that moves and stores data at scale, enabling both speed and safety.</p><p><strong>Software Engineer</strong><br>Builds the APIs, tools and services that make data accessible to applications and end users.</p><p></p><p>Every company is now a data company, and as data and AI play bigger roles, we need language that matches reality. Expecting a full-stack &#8216;data engineer&#8217; is madness. It&#8217;s like expecting one person to be both the bar technician who maintains the kegs and taps AND the bartender who serves customers. Both work with the same liquid - but one focuses on pressure, flow and keeping the system running - while the other focuses on what the customer actually wants and how to serve it properly.</p><p>The data analyst and software engineer roles have been around for a while and are well defined. However, it&#8217;s clear to see that what we have been calling &#8216;data engineer&#8217; is really two roles in disguise:</p><ul><li><p>Analytics Engineer</p></li><li><p>Data Platform Engineer</p></li></ul><p>The cost of this confusion isn&#8217;t just semantic. When you hire a data pipeline expert to build business logic, you get brittle data models. When you hire a data modelling expert to scale infrastructure, you get performance bottlenecks. In larger organisations, it&#8217;s time we stopped asking one person to be both bartender and bar technician. The next time you&#8217;re hiring a data engineer, ask yourself: do you need someone to build the pipes or someone to serve the data flowing through them?</p>]]></content:encoded></item></channel></rss>