It's interesting to note, that when I was first called a DE - it was just software engineer in the data domain.
As in writing full software, that happen to focus on data.
Just 6 years ago I would be tinkering with PrestoDB code, looking at optimizing the scheduler and building Hadoop extensions.
Between that and today the field swung to people who came from BI, with considerably less software engineering background. To the point that just 2 years ago, when applying for DE roles I would be confused why majority of my screening questions came in the form of "how well do you know SQL".
Today I do the same as I did 3-4 years ago, but I am no longer a data engineer.
Yeah I’m thinking of changing my title back to software dev instead of DE - it’s sort of getting a bad rep.
How do you define "bad rep"?
A lot of “data engineers” are former db analysts and such that don’t know much of anything technically outside of SQL and even that might be something they only are “certified” to know rather than actually good at.
It’s basically becoming a title I’d associate with being low-skill. I used to be a “software engineer in data” and never call myself a data engineer because people would think I don’t know how to write/maintain production services, just write ETL pipelines
Ok I guess you neew to find the worplace that suits you.
Just that I have had some recent experience with muti threaded java k8s services reading daily "streams" from Kafka.
When a SQL query would have solved it.
Many such cases in the data space of CV building over problem solving
I guess you have to decide what you want on your CV. "I solve problems" or "I build complex stuff".
I made the change from SWE to DE a few months ago and several fellow SWEs saw it a first glance as if I was downgrading into a less technical role. They assumed it was some kind of modern DBA.
I’ve always seen it as a Data Oriented Software Engineer but it seems that isn’t always the case, specially recently where I’ve been offered jobs that were basically analyst roles or BI roles.
The BI world is honestly kind of weird.
You have people who are at the intersection of "understands databases, the relational model, query optimization etc. at the level of a very senior SWE" ∩ "needs to be told how git works in the year of our lord 2023".
Don't forget the person who thinks they really have "big data" and needs all this massive infrastructure. Eventually, one discovers it's a few gigs of CSV files that fits in RAM on a laptop.
I feel like duckdb has been making waves here helping people realize you don’t need snowflake/bigquery/etc for all ypur datasets, but you can still get the nice feature set of those systems
What is duckdb
Embedded column store database. SQLite but much much faster for aggregations.
Im in this post and i dont like it lmao.
Yea same. Also, honestly feels like the way the field is progressing it will just be eaten up by an SWE role. Feel the same for ML engineer and many other specialized roles.
I don't think that SWEs will.
The software and services are going to be getting advanced enough to just eliminate the need for a dedicated team to build ETL. People with relevant domain knowledge will have an easier time to deliver their work product, without the overhead of building phase.
To get a reasonably good data platform - point-and-click ETL service, SAAS offering and the likes of Metabase - are already good enough for medium enterprises... and beat Databricks offerings for speed(setup, delivery and operation) in reporting and operational data access.
I am absolutely sure that there will be a massive contraction in the DS, DE and ML opportunity market in the next few years. The major companies will consolidate and jobs in those domains will only be available at only a handful of companies... or extremely specialized startups.(much like chip design is now consolidated)
Long story short, for companies - you probably don't need DS, ML and DE departments.
My experience is that many established companies are still struggling to get adequate operational reporting. Data engineers are still helpful to move the data necessary to make that happen. DS and ML become useful later once there's a more mature data culture and infrastructure. Otherwise you have analysts spending most of their time doing data engineering so they have something to analyze.
I think we are aligned I am just much more sceptical towards no code solutions. So I think these roles will shrink massively and the little code you have to still do will then become just another thing to integrate for your swe role (this is what is happening in my industry at least)
This lines up with my experience, and I've found it heavily depends on what industry you are in.
I was basically this for the past ten years. Maybe it was because I was working only in startups.
Outside of tech/startup orgs, "data engineer" at least I found, were SQL specialists. About six years ago, I went into healthcare, and discovered there were about 30 people across five teams that were data engineers. "Oh cool. My colleagues," I thought. Imagine my surprise when I found they only knew SQL, knew data modeling theory, and had basically no SDLC experience. At my present job, in a traditionally blue-collar industry, I took over a team with the only data engineer in the whole company. He, too, knew only SQL. I've had shove Python at him and get him working in SDLC.
I think these people though, are shrinking. Putting pressure on this from the other side, Python is a common skill with data analyst these days. Software engineers do the heavy lifting and good-enough data modeling, while data analyst do business-specific analysis and good-enough software development like writing DAGs with Dagster. Knowing SQL isn't enough to get by in the job market.
I know plenty of people who only use sql. There is now a role called analytics engineer that primarily sql and often with transformation tools like dbt.
I personally haven’t met a lot of software devs who would call their data modeling capabilities ‘good enough’. It is a different way of thinking to go from building 3rd form normalized tables to denormalizing everything. Plus if a company is merging data from many systems, it makes it more complicated for the software developer.
The moment you have learned SQL, is when you have forgotten everything about C/C++
It's the least defined role.
Currently, I am in a funny situation when all teams agree we need an additional data engineer. But basically:
- Sales and finance want more of business intelligence analyst
- Devs want more of a backend engineer
- ML researchers want a data analyst proficient in ETL to do pipelines on the training dataset
All of those 3 have only one thing in common - they need to know SQL very well. I've worked extensively with various technologies to analyze data - pandas, sql, spark. And still, I find SQL (especially recently BigQuery) getting me what I want the quickest.
I’m puzzled by this. Knowing SQL seems like a minor technical matter. You can learn whatever you need to do (or use gpt4 to quickly do stuff). Why does one need the expertise in syntax rather than just a general strong foundation in software and data stuff?
I had a similar experience at Airbnb.
My title at Airbnb was “Data Engineer” in 2016, then “Software Engineer - Data” 2016-2019, then just “Software Engineer” 2019-2023.
When I joined the DE team we were not in the Engineering Org, our manager reported to the head of Analytics (Chief Data Scientist). The DE perf cycle, job levels and comp were all tied to the Analytics Org levels. There was a Data Infra team (DI) under Engineering > Infrastructure who managed Presto, HBase, HIVE, &c. but didn’t touch pipelines, that was DE’s job.
Most of the DE’s owned more than pipelines though, many of us also wrote and owned services. Max on our team built Airflow and Caravel/Panoramix/Superset during hackathons, Johnathan built our Data Quality tool, Amit built the Minerva semantic metrics layer (which Nick, James and Paul spun out as Transform), Aaron built our Anomaly Detection platform, John built Dataportal, I built our Customer Support Roster service and a Kafka indexing service.
Our manager was awesome. She saw that we were undervalued in Analytics and lobbied successfully to move the team to the Engineering Infrastructure org. We were all retitled in Workday, our perf structure changed to align with the rest of Engineering, as did our levels.
DE living as a whole org team under Infra lasted less than a year before we were split up and distributed into the respective product teams we supported, as Software Engineers with a focus on building & maintaining pipelines, schemas, logging libraries… and the existing tools we had built. The intention was to be embedded into the product teams (Homes, Trips, Support Tools, &c.), skill up these teammates and share the oncall load. In reality what happened was that (at least) 3 DE teams then grew in the various product orgs.
Maybe this is different at the highest levels of the game but for the engineers in the more mainstream parts of the bell curve at the less than Google level of craziness and volume companies Data Engineers -- folks that have come up as former DBAs, DataWarehouse devs, db heavy backend devs, analytics / reporting folks -- it's been my experience that these folks tend to solve problems in a more straight forward, data centric, practical sort of way. And in my experience folks who enter a data role from the sofware side of things tend to come up with rather convoluted solutions to simple things.
Therefore I think the title distinction is warranted. It frames that the company is looking for engineers with skills in the software space -- source control mastery, knowledge of a language or two other than SQL, but also experience looking at query plans, designing large scale data systems, dealing with BI tools etc etc. A sw engineer from a traditional background CAN do this but I'd rather someone that fits the DE role more.