Data Engineering 101

Buriihenry
2 min readAug 22, 2022

What is Data Engineering?

Data engineering is the process of designing and building systems that let people collect and analyze raw data from multiple sources and formats

Data Engineering — lifecycle consists of building/architecting data platforms, designing and implementing data stores and repositories, data lakes and gathering, importing, cleaning, pre-processing, querying, analyzing data, performance monitoring, evaluation, optimization and fine tuning the processes and systems.

Data engineering helps make data more useful and accessible for consumers of data. To do so, data engineering must source, transform and analyze data from each system. For example, data stored in a relational database is managed as tables, like a Microsoft Excel spreadsheet. Each table contains many rows, and all rows have the same columns. A given piece of information, such as a customer order, may be stored across dozens of tables.

Why Data Engineering?

  • There’s more data than ever before, and data is growing faster than ever before.
  • Data is more valuable to companies, and across more business functions
  • The technologies used for data are more complex.
  • Companies are finding more ways to benefit from data.
  • As the demands for data increase, data engineering will become even more critical.

Key Data Engineering Skills and Tools

  • Experience with Programming : Python
  • Version Control — Git
  • ETL
  • SQL
  • Cloud Technologies ; AWS, Azure, GCP

Software Engineer vs Data Engineer Vs Data Scientist

Data Pipeline

--

--