19th May 2022 7 minutes read

The Recent Rise of DBT and the Analytics Engineer

Data Analysis

Table of Contents

From ETL to ELT: A New Way to Manage Data
What Is DBT?
SQL for DBT
Who Is An Analytics Engineer?
The Role of the Analytics Engineer in Data Democratization
Become an Analytics Engineer: Learn SQL!

If you’re interested in data engineering or data analytics, you should know about DBT. And you should also learn about the role of an analytics engineer. We cover both in this article.

The rise of Cloud data providers has led to new challenges and possibilities. If you are thinking about entering the data analytics space, it is important to be aware of them; it can help you decide what roles to aim for and what skills to learn.

This article gives you an overview of current data analytics trends and how they’re leading to new approaches, tools, and roles. Specifically, it introduces you to the data modeling tool DBT and shows you how you can use it to transform your data with SQL queries. It also describes the analytics engineer, a role that’s answering new needs in the data analytics space.

The trends show that SQL has become a central skill in data analytics, so learning SQL remains a great decision. Our SQL from A to Z learning track teaches you all the SQL skills you’ll need in any data role. It is a comprehensive track consisting of 7 full interactive SQL courses and more than 80 hours of material.

Let’s start our examination of DBT and the role of the analytics engineer (among other data professionals) by looking at how data is managed.

From ETL to ELT: A New Way to Manage Data

There are specific trends in data management that are behind the rise of the analytics engineer. The most important trend is the widespread replacement of on-premise data warehouses with Cloud-based data management services. Cloud services make data storage and processing affordable, flexible, and more manageable.

This new capability also allows users to follow a new data management methodology. Previously, data engineers had to manage data in Extract, Transform, Load cycles, transforming the data outside the data warehouses before storing it. This method is often slow and cumbersome, and it requires specialized engineering knowledge.

Cloud computing’s increased performance and flexibility allows users to follow an Extract, Load, Transform approach – storing the data in its raw form and transforming it within the warehouse. This approach is faster and less complex, as it does not require moving and processing the data outside the warehouse.

The ability to process data in the Cloud and follow an ELT approach makes it possible to build new data products and services leveraging that flexibility. It also makes it easy to build data extraction pipelines, reporting tools, and dashboards.

One of these tools, DBT, has become widely popular among data analysts, data engineers, and analytics engineers.

What Is DBT?

DBT is an open-source data modeling tool that integrates with your Cloud data provider and executes data transformations within the data warehouse. It specifically supports the ‘Transform’ step in the ELT pipeline by allowing you to create new tables and views from raw data for data analysts to consume.

Besides supporting ELT processes, DBT helps you follow software development principles in your analytics workflow. Specifically, it offers the following functionalities:

Modular data modeling with SQL SELECT statements.
Source control and CI/CD principles within your analytics workflows.
Flexible data transformations via programmatic Jinja macros.
Query, view, and data lineage documentation and data catalog.
Query testing and data validation.
Simplified data updates.

DBT benefits your analytics work in many ways. The most important is the ability to do data engineering without knowing the full data engineering toolset. The only skills DBT users require are SQL, data modeling, and version control (e.g. GitHub).

SQL for DBT

SQL is the most widely used language in data analytics, and Cloud data providers also rely on it. Thus, it’s not surprising that SQL is central to the DBT workflow.

DBT supports writing clean and modular SQL queries, which you can store in .sql files and reuse later. Let’s see an example query where we list filtered customer information. We’ll store this query in a file called stg_customers.sql.

-- stg_customers.sql

WITH customers AS (  
	SELECT
    	  id AS customer_id,
    	  first_name,
    	  last_name,
	  age,
	  gender,
	  phone_number,
	  address
	FROM raw.jaffle_shop.customers
	WHERE age > 35
)

SELECT * from customers

In another .sql file, dim_customers.sql, we want to pull data already filtered by our original query. We refer to this query in DBT by using the ref statement:

-- dim_customers.sql

WITH customers AS (
	SELECT customer_id, last_name
FROM {{ ref('stg_customers')}}
)

SELECT * FROM customers

The above query uses the results of a query defined elsewhere and saves us from rewriting it and producing a slightly altered duplicate. This modular organization with DBT allows building complex relationships between SQL queries in a clean way.

DBT also uses these .sql files as building blocks in its other functionalities to follow software development methodologies (e.g. source control and testing) for your analytics workflow. This approach means that the most important skill for DBT is SQL. Our courses teach you the SQL you need to know for fundamental DBT, like queries, data types, and creating and updating tables.

Who Is An Analytics Engineer?

The widespread adoption of Cloud data service providers, the shift from ETL to ELT processes, and the creation of tools like DBT have led to the emergence of a new role, the analytics engineer.

Interest over time for the “analytics engineer” keyword (source)

It’s helpful to compare analytics engineers to data analysts and data engineers because of the overlap between their tasks and skills.

Analytics engineers are similar to data engineers in that they are responsible for the engineering side of data and analytics. Unlike data engineers, analytics engineers don’t cover the full data engineering pipeline and rely less on complex data orchestration tools (e.g. Airflow and Luigi). Instead, they use transformation and data modeling tools like DBT and rely on their SQL knowledge.

Analytics engineers are also similar to data analysts in that they consider how to get insights from data. They have to think about the business side of data management, including the data schema and its possible use cases. They differ from data analysts in that they need to think about the upstream steps of the analytics pipeline, including data ingestion, maintenance, and monitoring. They also need to maintain data quality by providing tests and documentation.

Data source: getdbt.com

We can understand the difference between these roles by thinking about a library. Data engineers build the library infrastructure, including rooms and bookshelves. Data analysts use the library by taking out books to read. Analytics engineers are the librarians who organize the books, put them on the shelves, and manage them so analysts can use them.

Analytics engineers deliver well-defined, preprocessed data sets with documentation, tests, and code reviews. They are responsible for naming data objects, generating tables and views for specific use cases, and making them user-friendly. They also monitor the data pipeline and maintain data and code quality.

Analytics engineers are often the first data hire in an organization because their tasks overlap with both data analyst and data engineer tasks. For this reason, SQL is their most important skill, along with data modeling, version control, and scripting (e.g. Python).

The Role of the Analytics Engineer in Data Democratization

While DBT is currently the most prominent tool supporting the ELT workflow, it is not the only one. There are competing tools emerging in the landscape. Despite their differences, they all utilize the trend of affordable and flexible data processing services and answer the need for software development principles in analytics.

Overall, these developments all fit into the emerging trend of data-driven companies. Such companies are seeking to facilitate data democratization by increasing data literacy and the use of data tools among their employees. Thus, data analysts and analytics engineers become less reliant on data engineering capabilities and tooling and more capable of solving data and analytics problems on their own using SQL.

Become an Analytics Engineer: Learn SQL!

SQL is the most important skill you need if you want to use DBT to model data and eventually become an analytics engineer.

Learning SQL for DBT is not hard if you follow the right method. If you are looking for a place to quickly pick up SQL skills or elevate your SQL knowledge to a new level, check out our courses. We designed them so that you can immediately learn something useful.

Tags:

Data Analysis