The 4 Levels of a Data Engineer
Blog post from Soda
Data engineering is categorized into four distinct levels, each representing a progression in complexity and responsibility, with Level 1 focusing on SQL and basic ETL pipelines, Level 2 dealing with distributed systems and cloud data warehouses, and Level 3 encompassing streaming, orchestration, and big data architecture. Level 4, however, embodies a mature understanding that not every request necessitates building new pipelines, emphasizing the importance of restraint and data quality over complexity. This level challenges the prevalent industry focus on depth of stack and pipeline output, advocating instead for reliability and trustworthiness of data as the ultimate goals. The text highlights the over-engineering trap, where complexity is mistaken for capability, and stresses that reliable data, not extensive infrastructure, is the true deliverable. By focusing on data quality at every stage, engineers can build more effective systems, with Level 4 thinking encouraging a shift towards more efficient and trustworthy data solutions.