Skip to main content

Developer Blog | dbt Developer Hub

Find tutorials, product updates, and developer insights in the dbt Developer blog.

Start here

Getting Started with git Branching Strategies and dbt

· 31 min read
Christine Berger
Carol Ohms
Taylor Dunlap
Steve Dowling

Hi! We’re Christine and Carol, Resident Architects at dbt Labs. Our day-to-day work is all about helping teams reach their technical and business-driven goals. Collaborating with a broad spectrum of customers ranging from scrappy startups to massive enterprises, we’ve gained valuable experience guiding teams to implement architecture which addresses their major pain points.

The information we’re about to share isn't just from our experiences - we frequently collaborate with other experts like Taylor Dunlap and Steve Dowling who have greatly contributed to the amalgamation of this guidance. Their work lies in being the critical bridge for teams between implementation and business outcomes, ultimately leading teams to align on a comprehensive technical vision through identification of problems and solutions.

Why are we here?
We help teams with dbt architecture, which encompasses the tools, processes and configurations used to start developing and deploying with dbt. There’s a lot of decision making that happens behind the scenes to standardize on these pieces - much of which is informed by understanding what we want the development workflow to look like. The focus on having the perfect workflow often gets teams stuck in heaps of planning and endless conversations, which slows down or even stops momentum on development. If you feel this, we’re hoping our guidance will give you a great sense of comfort in taking steps to unblock development - even when you don’t have everything figured out yet!

There are three major tools that play an important role in dbt development:

  • A repository
    Contains the code we want to change or deploy, along with tools for change management processes.
  • A data platform
    Contains data for our inputs (loaded from other systems) and databases/schemas for our outputs, as well as permission management for data objects.
  • A dbt project
    Helps us manage development and deployment processes of our code to our data platform (and other cool stuff!)
dbt's relationship to git and the data platformdbt's relationship to git and the data platform

No matter how you end up defining your development workflow, these major steps are always present:

  • Development: How teams make and test changes to code
  • Quality Assurance: How teams ensure changes work and produce expected outputs
  • Promotion: How teams move changes to the next stage
  • Deployment: How teams surface changes to others

This article will be focusing mainly on the topic of git and your repository, how code corresponds to populating your data platform, and the common dbt configurations we implement to make this happen. We’ll also be pinning ourselves to the steps of the development workflow throughout.

Why focus on git?

Source control (and git in particular) is foundational to modern development with or without dbt. It facilitates collaboration between teams of any size and makes it easy to maintain oversight of the code changes in your project. Understanding these controlled processes and what code looks like at each step makes understanding how we need to configure our data platform and dbt much easier.

⭐️ How to “just get started” ⭐️

This article will be talking about git topics in depth — this will be helpful if your team is familiar with some of the options and needs help considering the tradeoffs. If you’re getting started for the first time and don’t have strong opinions, we recommend starting with Direct Promotion.

Direct Promotion is the foundation of all git branching strategies, works well with basic git knowledge, requires the least amount of provisioning, and can easily evolve into another strategy if or when your team needs it. We understand this recommendation can invoke some thoughts of “what if?”. We urge you to think about starting with direct promotion like getting a suit tailored. Your developers can wear it while you’re figuring out the adjustments, and this is a much more informative step forward because it allows us to see how the suit functions in motion — our resulting adjustments can be starkly different than what we thought we’d need when it was static.

The best part with ‘just getting started’ is that it’s not hard to change configurations in dbt for your git strategy later on (and we'll cover this), so don’t think of this as a critical decision that will that will result in months of breaking development for re-configuration if you don’t get it right immediately. Truly, changing your git strategy can be done in a matter of minutes in dbt Cloud.

Branching Strategies

Once a repository has its initial commit, it always starts with one default branch which is typically called main or master — we’ll be calling the default branch main in our upcoming examples. The main branch is always the final destination that we’re aiming to land our changes, and most often corresponds to the term "production" - another term you'll see us use throughout.

How we want our workflow to look getting our changes from development to main is the big discussion. Our process needs to consider all the steps in our workflow: development, quality assurance, promotion, and deployment. Branching Strategies define what this process looks like. We at dbt are not reinventing the wheel - a number of common strategies have already been defined, implemented, iterated on, and tested for at least a decade.

There are two major strategies that encompass all forms of branching strategies: Direct Promotion and Indirect Promotion. We’ll start by laying these two out simply:

  • What is the strategy?
  • How does the development workflow of the strategy look to a team?
  • Which repository branching rules and helpers help us in this strategy?
  • How do we commonly configure dbt Cloud for this strategy?
  • How do branches and dbt processes map to our data platform with this strategy?

Then, we’ll end by comparing the strategies and covering some frequently asked questions.

Know before you go

There are many ways to configure each tool (especially dbt) to accomplish what you need. The upcoming strategy details were written intently to provide what we think are the minimal standards to get teams up and running quickly. These are starter configurations and practices which are easy to tweak and adjust later on. Expanding on these configurations is and exercise left to the reader!

Direct Promotion

Direct promotion means we only keep one long-lived branch in our repository — in our case, main. Here’s the workflow for this strategy:

Direct promotion branching strategyDirect promotion branching strategy

How does the development workflow look to a team?

Layout:

  • feature is the developer’s unique branch where task-related changes happen
  • main is the branch that contains our “production” version of code

Workflow:

  • Development: I create a feature branch from main to make, test, and personally review changes
  • Quality Assurance: I open a pull request comparing my feature against main, which is then reviewed by peers (required), stakeholders, or subject matter experts (SMEs). We highly recommend including stakeholders or SMEs for feedback during PR in this strategy because the next step changes main.
  • Promotion: After all required approvals and checks, I merge my changes to main
  • Deployment: Others can see and use my changes in main after I merge and main is deployed

Repository Branching Rules and Helpers

At a minimum, we like to set up:

dbt Cloud Processes and Environments

Here’s our branching strategy again, but now with the dbt Cloud processes we want to incorporate:

Direct Promotion strategy with dbt cloud processes denotedDirect Promotion strategy with dbt cloud processes denoted

In order to create the jobs in our diagram, we need dbt Cloud environments. Here are the common configurations for this setup:

Environment NameEnvironment TypeDeployment TypeBase BranchWill handle…
Developmentdevelopment-mainOperations done in the IDE (including creating feature branches)
Continuous IntegrationdeploymentGeneralmainA continuous integration job
ProductiondeploymentProductionmainA deployment job

Data Platform Organization

Now we need to focus on where we want to build things in our data platform. For that, we need to set our database and schema settings on the environments. Here’s our diagram again, but now mapping how we want our objects to populate from our branches to our data platform:

Direct Promotion strategy with branch relations to data platform objectsDirect Promotion strategy with branch relations to data platform objects

Taking the table we created previously for our dbt Cloud environment, let's further map environment configurations to our data platform:

Environment NameDatabaseSchema
DevelopmentdevelopmentUser-specified in Profile Settings > Credentials
Continuous IntegrationdevelopmentAny safe default, like dev_ci (it doesn’t even have to exist). The job we intend to set up will override the schema here anyway to denote the unique PR.
Productionproductionanalytics
note

We are showing environment configurations here, but a default database will be set at the highest level in a connection (which is a required setting of an environment). Deployment environments can override a connection's database setting when needed.

Direct Promotion Example

In this example, Steve uses the term “QA” for defining the environment which builds the changed code from feature branch pull requests. This is equivalent to our ‘Continuous Integration’ environment — this is a great example of defining names which make the most sense for your team!

Indirect Promotion

A note about Indirect Promotion

Indirect Promotion introduces more steps of ownership, so this branching strategy works best when you can identify people who have a great understanding of git to handle branch management. Additionally, the time from development to production is lengthier due to the workload of these new steps, so it requires good project management. We expand more on this later, but it’s an important call out as this is where we see unprepared teams struggle most.

Indirect promotion adds other long-lived branches that derive from main. The most simple version of indirect promotion is a two-trunk hierarchical structure — this is the one we see implemented most commonly in indirect workflows.

Hierarchical promotion is promoting changes back the same way we derived the branches. Example:

  • a middle branch is derived from main
  • feature branches derive from the middle branch
  • feature branches merge back to the middle branch
  • the middle branch merges back to main

Some common names for a middle branch as seen in the wild are:

  • qa : Quality Assurance
  • uat : User Acceptance Testing
  • staging or preprod : Common software development terminology

We’ll be calling our middle branch qa from throughout the rest of this article.

Here’s the workflow for this strategy:

Indirect Promotion branching strategyIndirect Promotion branching strategy

How does the development workflow look to a developer?

Changes from our direct promotion workflow are highlighted in blue.

Layout:

  • feature is the developer’s unique branch where task-related changes happen
  • qa contains approved changes from developers’ feature branches, which will be merged to main and enter production together once additional testing is complete.qa is always ahead of main in changes.
  • main is the branch that contains our “production” version of code

Workflow:

  • Development: I create a feature branch from qa to make, test, and personally review changes
  • Quality Assurance: I open a pull request comparing my feature branch to qa, which is then reviewed by peers and optionally subject matter experts or stakeholders
  • Promotion: After all required approvals and checks, I can merge my changes to qa
  • Quality Assurance: SMEs or other stakeholders can review my changes in qa when I merge my feature
  • Promotion: Once QA specialists give their approval of qa’s version of data, a release manager opens a pull request using qa’s branch targeting main (we define this as a “release”)
  • Deployment: Others can see and use my changes (and other’s changes) in main after qa is merged to main and main is deployed

Repository Branching Rules and Helpers

At a minimum, we like to set up:

dbt Cloud Processes and Environments

Here’s our branching strategy again, but now with the dbt Cloud processes we want to incorporate:

Indirect Promotion strategy with dbt cloud processes denotedIndirect Promotion strategy with dbt cloud processes denoted

In order to create the jobs in our diagram, we need dbt Cloud environments. Here are the common configurations for this setup:

Environment NameEnvironment TypeDeployment TypeBase BranchWill handle…
Developmentdevelopment-qaOperations done in the IDE (including creating feature branches)
Feature CIdeploymentGeneralqaA continuous integration job
Quality AssurancedeploymentStagingqaA deployment job
Release CIdeploymentGeneralmainA continuous integration job
ProductiondeploymentProductionmainA deployment job

Data Platform Organization

Now we need to focus on where we want to build things in our data platform. For that, we need to set our database and schema settings on the environments. There are two common setups for mapping code, but before we get in to those remember this note from direct promotion:

note

We are showing environment configurations here, but a default database will be set at the highest level in a connection (which is a required setting of an environment). Deployment environments can override a connection's database setting when needed.

  • Configuration 1: A 1:1 of qa and main assets In this pattern, the CI schemas are populated in a database outside of Production and QA. This is usually done to keep the databases aligned to what’s been merged on their corresponding branches. Here’s our diagram, now mapping to the data platform with this pattern:

    Indirect Promotion branches and how they relate to 1:1 organization in the data platformIndirect Promotion branches and how they relate to 1:1 organization in the data platform

    Here are our configurations for this pattern:

    Environment NameDatabaseSchema
    DevelopmentdevelopmentUser-specified in Profile Settings > Credentials
    Feature CIdevelopmentAny safe default, like dev_ci (it doesn’t even have to exist). The job we intend to set up will override the schema here anyway to denote the unique PR.
    Quality Assuranceqaanalytics
    Release CIdevelopmentA safe default
    Productionproductionanalytics
  • Configuration 2: A reflection of the workflow initiative

    In this pattern, the CI schemas populate in a qa database because it’s a step in quality assurance. Here’s our diagram, now mapping to the data platform with this pattern:

    Indirect Promotion branches and how they relate to workflow initiative organization in the data platformIndirect Promotion branches and how they relate to workflow initiative organization in the data platform

    Here are our configurations for this pattern:

    Environment NameDatabaseSchema
    DevelopmentdevelopmentUser-specified in Profile Settings > Credentials
    Feature CIqaAny safe default, like dev_ci (it doesn’t even have to exist). The job we intend to set up will override the schema here anyway to denote the unique PR.
    Quality Assuranceqaanalytics
    Release CIqaA safe default
    Productionproductionanalytics

Indirect Promotion Example

In this example, Steve uses the term “UAT” to define the automatic deployment of the middle branch and “QA” to define what’s built from feature branch pull requests. He also defines a database for each (with four databases total - one for development schemas, one for CI schemas, one for middle branch deployments, and one for production deployments) — we wanted to show you this example as it speaks to how configurable these processes are apart from our standard examples.

What did Indirect Promotion change?

You’ve probably noticed there is one overall theme of adding our additional branch, and that’s supporting our Quality Assurance initiative. Let’s break it down:

  • Development

    While no one will be developing in the qa branch itself, it does need a level of oversight just like a feature branch needs in order to stay in sync with its base branch. This is because a change now to main (like a hotfix or accidental merge) won’t immediately flag our feature branches since they are based off of qa's version of code. This branch needs to stay in sync with any change in main for this reason.

  • Quality Assurance

    There are now two places where quality can be reviewed (feature and qa) before changes hit production. qa is typically leveraged in at least one of these ways for more quality assurance work:

    • Testing and reviewing how end-to-end changes are performing over time
    • Deploying the full image of the qa changes to a centralized location. Some common reasons to deploy qa code are:
      • Testing builds from environment-specific data sets (dynamic sources)
      • Creating staging versions of workbooks in your BI tool This is most relevant when your BI tool doesn’t do well with changing underlying schemas. For instance, some tools have better controls for grabbing a production workbook for development, switching the underlying schema to a dbt_cloud_pr_# schema, and reflecting those changes without breaking things. Other tools will break every column selection you have in your workbook, even if the structure is the same. For this reason, it is sometimes easier to create one “staging” version workbook and always point it to a database built from QA code - the changes then can always be reflected and reviewed from that workbook before the code changes in production.
      • For other folks who want to see or test changes, but aren’t personas that would be included in the review process. For instance, you may have a subject matter expert reviewing and approving alongside developers, who understands the process of looking at dbt_cloud_pr schemas. However, if this person now communicates that they have just approved some changes with development to their teammates who will use those changes, the team might ask if there is a way they can also see the changes. Since the CI schema is dropped after merge, they would need to wait see this change in production if there is no process deploying the middle branch.
  • Promotion

    There are now two places where code needs to be promoted:

    • From feature to qa by a developer and peer (and optionally SMEs or stakeholders)
    • From qa to main by a release manager and SMEs or stakeholders

    Additionally, approved changes from feature branches are promoted together from qa.

  • Deployment

    There are now major branches code can be deployed from:

    • qa : The “working” version with changes, features merge here
    • main : The “production” version

    Due to our changes collecting on the qa branch, our deployment process changes from continuous deployment (”streaming” changes to main in direct promotion) to continuous delivery (”batched” changes to main). Julia Schottenstein does a great job explaining the differences here.

Comparing Branching Strategies

Since most teams can make direct promotion work, we’ll list some key flags for when we start thinking about indirect promotion with a team:

  • They speak about having a dedicated environment for a QA, UAT, staging, or pre-production work.
  • They ask how they can test changes end-to-end and over time before things hit production.
  • Their developers aren’t the same, or the only, folks who are checking data outputs for validity - even more so if the other folks are more familiar doing this validation work from other tools.
  • Their different environments aren’t working with identical data. Like software environments, they may have limited or scrubbed versions of production data depending on the environment.
  • They have a schedule in mind for making changes “public”, and want to hold features back from being seen or usable until then.
  • They have very high-stakes data consumption.

If you fit any of these, you likely fit into an indirect promotion strategy.

Strengths and Weaknesses

We highly recommend that you choose your branching strategy based on which best supports your workflow needs over any perceived pros and cons — when these are put in the context of your team’s structure and technical skills, you’ll find some aren’t strengths or weaknesses at all!

  • Direct promotion

    Strengths

    • Much faster in terms of seeing changes - once the PR is merged and deployed, the changes are “in production”.
    • Changes don’t get stuck in a middle branch that’s pending the acceptance of someone else’s validation on data output.
    • Management is mainly distributed - every developer owns their own branch and ensuring it’s in sync with what’s in main.
    • There’s no releases to worry about, so no extra processes to manage.

    Weaknesses

    • It can present challenges for testing changes end-to-end or over time. Our desire to build only modified and directly impacted models to reduce the amount of models executed in CI goes against the grain of full end-to-end testing, and our mechanism which executes only upon pull request or new commit won’t help us test over time.
    • It can be more difficult for differing schedules or technical abilities when it comes to review. It’s essential in this strategy to include stakeholders or subject matter experts on pull requests before merge, because the next step is production. Additionally, some tools aren’t great at switching databases and schemas even if the shape of the data is the same. Constant breakage of reports for review can be too much overhead.
    • It can be harder to test configurations or job changes before they hit production, especially if things function a bit differently in development.
    • It can be harder to share code that works fully but isn’t a full reflection of a task. Changes need to be agreed upon to go to production so others can pull them in, otherwise developers need to know how to pull these in from other branches that aren’t main (and be aware of staying in sync or risk merge conflicts).
  • Indirect promotion

    Strengths

    • There’s a dedicated environment to test end-to-end changes over time.
    • Data output can be reviewed either with a developer on PR or once things hit the middle branch.
    • Review from other tools is much easier, because the middle branch tends to deploy to a centralized location. “Staging” reports can be set up to always refer to this location for reviewing changes, and processes for creating new reports can flow from staging to production.
    • Configurations and job changes can be tested with production-like parameters before they actually hit production.
    • There’s a dedicated environment to merge changes if you need them for shared development. Consumers of main will be none-the-wiser about the things that developers do for ease of collaboration.

    Weaknesses

    • Changes can be slower to get to production due to the extra processes intended for the middle branch. In order to keep things moving, there should be someone (or a group of people) in place who fully own managing the changes, validation status, and release cycle.
    • Changes that are valid can get stuck behind other changes that aren’t - having a good plan in place for how the team should handle this scenario is essential because conundrum can hold up getting things to production.
    • There’s extra management of any new trunks, which will need ownership - without someone (or a group of people) who are knowledgeable, it can be confusing understanding what needs to be done how to do it when things get out of sync.
    • Requires additional compute in the form of scheduled jobs in the qa environment as well as an additional CI job from qa > main

Further Enhancements

Once you have your basic configurations in place, you can further tweak your project by considering which other features will be helpful for your needs:

Frequently Asked git Questions

General

How do you prevent developers from changing specific files?

Code owners files can help tag appropriate reviewers when certain files or folders are changed

How do you execute other types of checks in the development workflow?

If you’re thinking about auto-formatting or linting code, you can implement this within your dbt project.

Other checks are usually implemented through git pipelines (such as GitHub Actions) to run when git events happen (such as checking that a branch name follows a pattern upon a pull request event).

How do you revert changes?

This is an action performed outside of dbt through git operations - however, we recommend instead using an immediate solution with git tags/releases until your code is fixed to your liking:

  • Apply a git tag (an available feature on most git platforms) on the commit SHA that you want to roll back to
  • Use the tag as your custom branch on your production environment in dbt Cloud. Your jobs will now check out the code at this point in time.
  • Now you can work as normal. Fix things through the development workflow or have a knowledgeable person revert the changes through git, it doesn’t matter - production is pinned to the previous state until you change the custom branch back to main!

Indirect Promotion Specific

How do you make releases?

For our examples, a release is just a pull request to get changes into main from qa, opened from the git platform.

You should be aware that having the source branch as qa on your pull request will also incorporate any new merges to qa since you opened the pull request, until it’s merged. Because of this it’s important that the person opening a release is aware of what the latest changes were and when a job last ran to indicate the success of all the release’s changes. There are two options we like to implement to make this easier:

  • A CI job for pull requests to main - this will catch and rerun our CI job if there’s any new commits on our qa branch
  • An on-merge job using our qa environment. This will run a job any time someone merges. You may opt for this if you’d rather not wait on a CI pipeline to finish when you open a release. If this option is used, the latest job that ran should be successful and linked on the release’s PR.

Hierarchical promotion introduces changes that may not be ready for production yet, which holds up releases. How do you manage that?

The process of choosing specific commits to move to another branch is called Cherry Picking.

Cherry Picking diagramCherry Picking diagram

You may be tempted to change to a less standard branching strategy to avoid this - our colleague Grace Goheen has written some thoughts on this and provided examples - it’s a worthwhile read!

dbt does not perform cherry picking operations and needs to be done from a command line interface or your git platform’s user interface, if the option is available. We align with Grace on this one — not only does cherry picking require a very good understanding of git operations and the state of the branches, but when it isn’t done with care it introduces a host of other issues that can be hard to resolve. What we tend to see is that the CI processes we’ve exemplified instead shift what the definition of the first PR’s approval is - not only can it be approved for coding and syntax by a peer, but it can also be approved for it’s output by selecting from objects built within the CI schema. This eliminates a lot of the issues with code that can’t be merged to production.

We also implement other features that can help us omit offending models or introduce more quality:

If you are seeing a need to cherry-pick regularly, assessing your review and quality assurance processes and where they are happening in your pipeline can be very helpful in determining how you can avoid it.

What if a bad change made it all the way in to production?

The process of fixing main directly is called a hotfix. This needs to be done with git locally or with your git platform’s user interface because dbt’s IDE is based on the branch you set for your developer to base from (in our case, qa).

The pattern for hotfixes in hierarchical promotion looks like this:

Hotfix diagramHotfix diagram

Here’s how it’s typically performed:

  1. Create a branch from main, test and review the fix
  2. Open a PR to main, get the fix approved, then merge. The fix is now live.
  3. Check out qa, and git pull to ensure it’s up to date with what’s on the remote
  4. Merge main into qa: git merge main
  5. git push the changes back to the remote
  6. At this point in our example, developers will be flagged in dbt Cloud’s IDE that there is a change on their base branch and can ”Pull from remote”. However, if you implement more than one middle branch you will need to continue resolving your branches hierarchically until you update the branch that developers base from.

What if we want to use more than one middle branch in our strategy?

In our experience, using more than one middle branch is rarely needed. The more steps you are away from main, the more hurdles you’ll need to jump through getting back to it. If your team isn’t properly equipped, this ends up putting a lot of overhead on development operations. For this reason, we don’t recommend more branches if you can help it. The teams who are successful with more trunks are built with plenty of folks who can properly dedicate the time and management to these processes.

A git strategy with more branchesA git strategy with more branches

This structure is mostly desired when there are requirements for using different versions data (i.e, scrubbed data) by different teams, but working with the same code changes. This structure allows each team to have a dedicated environment for deployments. Example:

  1. Developers work off of mocked data for their feature branches and merge to qa for end-to-end and over-time testing of all merged changes before releasing to preproduction.
  2. Once qa is merged to preproduction, the underlying data being used switches to using scrubbed production data and other personas can start looking at and reviewing how this data is functioning before it hits production.
  3. One preproduction is merged to main, the underlying data being used switches to production data sets.

This use case can be covered with a more simple branching strategy through the use of git tags and dbt environment variables to switch source data:

  • Indirect Promotion:

    Tagging in Indirect PromotionTagging in Indirect Promotion
  • Direct Promotion:

    Tagging in Direct PromotionTagging in Direct Promotion

No matter the reason for more branches, these points are always relevant to plan out:

  • Can we accurately describe the use case of each branch?
  • Who owns the oversight of any new branches?
  • Who are the major players in the promotion process between each branch and what are they responsible for?
  • Which major branches do we want dbt Cloud deployment jobs for?
  • Which PR stages do we want continuous integration jobs on?
  • Which major branch rules or PR templates do we need to add?

By answering these questions, you should be able to follow our same guidance from our examples for setting up your additional branches.

Direct Promotion Specific

We need a middle environment and don’t want to change our branching strategy! Is there any way to reflect what’s in development?

git releases/tags are a mechanism which help you label a specific commit SHA. Deployment environments in dbt Cloud can use these just like they can a custom branch. Teams will leverage this either to pin their environments to code at a certain point in time or to keep as a roll-back option if needed.

We can use the pinning method to create our middle environment. Example:

  • We create a release tag, v2, from our repository.
  • We specify v2 as our branch in our Production environment’s custom branch setting. Jobs using Production will now check out code at v2.
  • We set up an environment called “QA”, with the custom branch setting as main. For the database and schema, we specify the qa database and analytics schema. Jobs created using this environment will check out code from main and built it to qa.analytics.
Tagging in Direct Promotion to create a middle environmentTagging in Direct Promotion to create a middle environment

How do we change from a direct promotion strategy to an indirect promotion strategy?

Here’s the additional setup steps in a nutshell - for more details be sure to read through the indirect promotion section:

  • git Platform
    • Create a new branch derived from main for your middle branch.
    • Protect the branch with branch protection rules
  • dbt Cloud
    • Development: Switch your environment to use the custom branch option and specify your new middle branch’s name. This will base developers off of the middle branch.
    • Continous Integration: If you have an existing environment for this, ensure the custom branch is also changed to the middle branch’s name. This will change the CI job’s trigger to occur on pull requests to your middle branch.

At this point, your developers will be following the indirect promotion workflow and you can continue working on things in the background. You may still need to set up a database, database permissions, environments, deployment jobs, etc. Here is a short checklist to help you out! Refer back to our section on indirect promotion for many more details:

  • Decide if you want to deploy your middle branch’s code. If so:
    • If needed, create the database where the objects will build

    • Set up a service and give it all the proper permissions. For example, if that will be in a database,

      the service account should have full access to create and modify the contents within this database. It should also have select-only access to raw data.

    • Set up an environment for your middle branch in dbt Cloud, being sure to connect it to the location you want your deployments to build in.

    • Set up any deployment jobs using your middle branch’s environment

  • Decide if you want CI on release pull requests (from your middle branch to main). If so:
    • Set up an environment called “Release CI”
    • Set up the continuous integration job using the “Release CI” environment

The key technologies behind SQL Comprehension

· 16 min read
Dave Connors

You ever wonder what’s really going on in your database when you fire off a (perfect, efficient, full-of-insight) SQL query to your database?

OK, probably not 😅. Your personal tastes aside, we’ve been talking a lot about SQL Comprehension tools at dbt Labs in the wake of our acquisition of SDF Labs, and think that the community would benefit if we included them in the conversation too! We recently published a blog that talked about the different levels of SQL Comprehension tools. If you read that, you may have encountered a few new terms you weren’t super familiar with.

In this post, we’ll talk about the technologies that underpin SQL Comprehension tools in more detail. Hopefully, you come away with a deeper understanding of and appreciation for the hard work that your computer does to turn your SQL queries into actionable business insights!

The Three Levels of SQL Comprehension: What they are and why you need to know about them

· 9 min read
Joel Labes

Ever since dbt Labs acquired SDF Labs last week, I've been head-down diving into their technology and making sense of it all. The main thing I knew going in was "SDF understands SQL". It's a nice pithy quote, but the specifics are fascinating.

For the next era of Analytics Engineering to be as transformative as the last, dbt needs to move beyond being a string preprocessor and into fully comprehending SQL. For the first time, SDF provides the technology necessary to make this possible. Today we're going to dig into what SQL comprehension actually means, since it's so critical to what comes next.

Why I wish I had a control plane for my renovation

· 4 min read
Mark Wan

When my wife and I renovated our home, we chose to take on the role of owner-builder. It was a bold (and mostly naive) decision, but we wanted control over every aspect of the project. What we didn’t realize was just how complex and exhausting managing so many moving parts would be.

My wife pondering our sanityMy wife pondering our sanity

We had to coordinate multiple elements:

  • The architects, who designed the layout, interior, and exterior.
  • The architectural plans, which outlined what the house should look like.
  • The builders, who executed those plans.
  • The inspectors, councils, and energy raters, who checked whether everything met the required standards.

Test smarter not harder: Where should tests go in your pipeline?

· 8 min read
Faith McKenna
Jerrie Kumalah Kenney

👋 Greetings, dbt’ers! It’s Faith & Jerrie, back again to offer tactical advice on where to put tests in your pipeline.

In our first post on refining testing best practices, we developed a prioritized list of data quality concerns. We also documented first steps for debugging each concern. This post will guide you on where specific tests should go in your data pipeline.

Note that we are constructing this guidance based on how we structure data at dbt Labs. You may use a different modeling approach—that’s okay! Translate our guidance to your data’s shape, and let us know in the comments section what modifications you made.

First, here’s our opinions on where specific tests should go:

  • Source tests should be fixable data quality concerns. See the callout box below for what we mean by “fixable”.
  • Staging tests should be business-focused anomalies specific to individual tables, such as accepted ranges or ensuring sequential values. In addition to these tests, your staging layer should clean up any nulls, duplicates, or outliers that you can’t fix in your source system. You generally don’t need to test your cleanup efforts.
  • Intermediate and marts layer tests should be business-focused anomalies resulting specifically from joins or calculations. You also may consider adding additional primary key and not null tests on columns where it’s especially important to protect the grain.

Test smarter not harder: add the right tests to your dbt project

· 11 min read
Faith McKenna
Jerrie Kumalah Kenney

The Analytics Development Lifecycle (ADLC) is a workflow for improving data maturity and velocity. Testing is a key phase here. Many dbt developers tend to focus on primary keys and source freshness. We think there is a more holistic and in-depth path to tread. Testing is a key piece of the ADLC, and it should drive data quality.

In this blog, we’ll walk through a plan to define data quality. This will look like:

  • identifying data hygiene issues
  • identifying business-focused anomaly issues
  • identifying stats-focused anomaly issues

Once we have defined data quality, we’ll move on to prioritize those concerns. We will:

  • think through each concern in terms of the breadth of impact
  • decide if each concern should be at error or warning severity

Snowflake feature store and dbt: A bridge between data pipelines and ML

· 14 min read
Randy Pettus
Luis Leon

Flying home into Detroit this past week working on this blog post on a plane and saw for the first time, the newly connected deck of the Gordie Howe International bridge spanning the Detroit River and connecting the U.S. and Canada. The image stuck out because, in one sense, a feature store is a bridge between the clean, consistent datasets and the machine learning models that rely upon this data. But, more interesting than the bridge itself is the massive process of coordination needed to build it. This construction effort — I think — can teach us more about processes and the need for feature stores in machine learning (ML).

Think of the manufacturing materials needed as our data and the building of the bridge as the building of our ML models. There are thousands of engineers and construction workers taking materials from all over the world, pulling only the specific pieces needed for each part of the project. However, to make this project truly work at this scale, we need the warehousing and logistics to ensure that each load of concrete rebar and steel meets the standards for quality and safety needed and is available to the right people at the right time — as even a single fault can have catastrophic consequences or cause serious delays in project success. This warehouse and the associated logistics play the role of the feature store, ensuring that data is delivered consistently where and when it is needed to train and run ML models.

Iceberg Is An Implementation Detail

· 6 min read
Amy Chen

If you haven’t paid attention to the data industry news cycle, you might have missed the recent excitement centered around an open table format called Apache Iceberg™. It’s one of many open table formats like Delta Lake, Hudi, and Hive. These formats are changing the way data is stored and metadata accessed. They are groundbreaking in many ways.

But I have to be honest: I don’t care. But not for the reasons you think.

How Hybrid Mesh unlocks dbt collaboration at scale

· 7 min read
Jason Ganz

One of the most important things that dbt does is unlock the ability for teams to collaborate on creating and disseminating organizational knowledge.

In the past, this primarily looked like a team working in one dbt Project to create a set of transformed objects in their data platform.

As dbt was adopted by larger organizations and began to drive workloads at a global scale, it became clear that we needed mechanisms to allow teams to operate independently from each other, creating and sharing data models across teams — dbt Mesh.

How to build a Semantic Layer in pieces: step-by-step for busy analytics engineers

· 10 min read

The dbt Semantic Layer is founded on the idea that data transformation should be both flexible, allowing for on-the-fly aggregations grouped and filtered by definable dimensions and version-controlled and tested. Like any other codebase, you should have confidence that your transformations express your organization’s business logic correctly. Historically, you had to choose between these options, but the dbt Semantic Layer brings them together. This has required new paradigms for how you express your transformations though.

Putting Your DAG on the internet

· 5 min read
Ernesto Ongaro
Sebastian Stan
Filip Byrén

New in dbt: allow Snowflake Python models to access the internet

With dbt 1.8, dbt released support for Snowflake’s external access integrations further enabling the use of dbt + AI to enrich your data. This allows querying of external APIs within dbt Python models, a functionality that was required for dbt Cloud customer, EQT AB. Learn about why they needed it and how they helped build the feature and get it shipped!

Up and Running with Azure Synapse on dbt Cloud

· 11 min read
Anders Swanson

At dbt Labs, we’ve always believed in meeting analytics engineers where they are. That’s why we’re so excited to announce that today, analytics engineers within the Microsoft Ecosystem can use dbt Cloud with not only Microsoft Fabric but also Azure Synapse Analytics Dedicated SQL Pools (ASADSP).

Since the early days of dbt, folks have been interested having MSFT data platforms. Huge shoutout to Mikael Ene and Jacob Mastel for their efforts back in 2019 on the original SQL Server adapters (dbt-sqlserver and dbt-mssql, respectively)

The journey for the Azure Synapse dbt adapter, dbt-synapse, is closely tied to my journey with dbt. I was the one who forked dbt-sqlserver into dbt-synapse in April of 2020. I had first learned of dbt only a month earlier and knew immediately that my team needed the tool. With a great deal of assistance from Jeremy and experts at Microsoft, my team and I got it off the ground and started using it. When I left my team at Avanade in early 2022 to join dbt Labs, I joked that I wasn’t actually leaving the team; I was just temporarily embedding at dbt Labs to expedite dbt Labs getting into Cloud. Two years later, I can tell my team that the mission has been accomplished! Kudos to all the folks who have contributed to the TSQL adapters either directly in GitHub or in the community Slack channels. The integration would not exist if not for you!

Unit testing in dbt for test-driven development

· 9 min read
Doug Beatty

Do you ever have "bad data" dreams? Or am I the only one that has recurring nightmares? 😱

Here's the one I had last night:

It began with a midnight bug hunt. A menacing insect creature has locked my colleagues in a dungeon, and they are pleading for my help to escape . Finding the key is elusive and always seems just beyond my grasp. The stress is palpable, a physical weight on my chest, as I raced against time to unlock them.

Of course I wake up without actually having saved them, but I am relieved nonetheless. And I've had similar nightmares involving a heroic code refactor or the launch of a new model or feature.

Good news: beginning in dbt v1.8, we're introducing a first-class unit testing framework that can handle each of the scenarios from my data nightmares.

Before we dive into the details, let's take a quick look at how we got here.

Conversational Analytics: A Natural Language Interface to your Snowflake Data

· 12 min read
Doug Guthrie

Introduction

As a solutions architect at dbt Labs, my role is to help our customers and prospects understand how to best utilize the dbt Cloud platform to solve their unique data challenges. That uniqueness presents itself in different ways - organizational maturity, data stack, team size and composition, technical capability, use case, or some combination of those. With all those differences though, there has been one common thread throughout most of my engagements: Generative AI and Large Language Models (LLMs). Data teams are either 1) proactively thinking about applications for it in the context of their work or 2) being pushed to think about it by their stakeholders. It has become the elephant in every single (zoom) room I find myself in.

How we're making sure you can confidently switch to the "Latest" release track in dbt Cloud

· 10 min read
Michelle Ark
Chenyu Li
Colin Rogers
Versionless is now the "latest" release track

This blog post was updated on December 04, 2024 to rename "versionless" to the "latest" release track allowing for the introduction of less-frequent release tracks. Learn more about Release Tracks and how to use them.

As long as dbt Cloud has existed, it has required users to select a version of dbt Core to use under the hood in their jobs and environments. This made sense in the earliest days, when dbt Core minor versions often included breaking changes. It provided a clear way for everyone to know which version of the underlying runtime they were getting.

However, this came at a cost. While bumping a project's dbt version appeared as simple as selecting from a dropdown, there was real effort required to test the compatibility of the new version against existing projects, package dependencies, and adapters. On the other hand, putting this off meant foregoing access to new features and bug fixes in dbt.

But no more. Today, we're ready to announce the general availability of a new option in dbt Cloud: the "Latest" release track.

Maximum override: Configuring unique connections in dbt Cloud

· 6 min read

dbt Cloud now includes a suite of new features that enable configuring precise and unique connections to data platforms at the environment and user level. These enable more sophisticated setups, like connecting a project to multiple warehouse accounts, first-class support for staging environments, and user-level overrides for specific dbt versions. This gives dbt Cloud developers the features they need to tackle more complex tasks, like Write-Audit-Publish (WAP) workflows and safely testing dbt version upgrades. While you still configure a default connection at the project level and per-developer, you now have tools to get more advanced in a secure way. Soon, dbt Cloud will take this even further allowing multiple connections to be set globally and reused with global connections.

LLM-powered Analytics Engineering: How we're using AI inside of our dbt project, today, with no new tools.

· 10 min read
Joel Labes

Cloud Data Platforms make new things possible; dbt helps you put them into production

The original paradigm shift that enabled dbt to exist and be useful was databases going to the cloud.

All of a sudden it was possible for more people to do better data work as huge blockers became huge opportunities:

  • We could now dynamically scale compute on-demand, without upgrading to a larger on-prem database.
  • We could now store and query enormous datasets like clickstream data, without pre-aggregating and transforming it.

Today, the next wave of innovation is happening in AI and LLMs, and it's coming to the cloud data platforms dbt practitioners are already using every day. For one example, Snowflake have just released their Cortex functions to access LLM-powered tools tuned for running common tasks against your existing datasets. In doing so, there are a new set of opportunities available to us:

Column-Level Lineage, Model Performance, and Recommendations: ship trusted data products with dbt Explorer

· 9 min read
Dave Connors

What’s in a data platform?

Raising a dbt project is hard work. We, as data professionals, have poured ourselves into raising happy healthy data products, and we should be proud of the insights they’ve driven. It certainly wasn’t without its challenges though — we remember the terrible twos, where we worked hard to just get the platform to walk straight. We remember the angsty teenage years where tests kept failing, seemingly just to spite us. A lot of blood, sweat, and tears are shed in the service of clean data!

Once the project could dress and feed itself, we also worked hard to get buy-in from our colleagues who put their trust in our little project. Without deep trust and understanding of what we built, our colleagues who depend on your data (or even those involved in developing it with you — it takes a village after all!) are more likely to be in your DMs with questions than in their BI tools, generating insights.

When our teammates ask about where the data in their reports come from, how fresh it is, or about the right calculation for a metric, what a joy! This means they want to put what we’ve built to good use — the challenge is that, historically, it hasn’t been all that easy to answer these questions well. That has often meant a manual, painstaking process of cross checking run logs and your dbt documentation site to get the stakeholder the information they need.

Enter dbt Explorer! dbt Explorer centralizes documentation, lineage, and execution metadata to reduce the work required to ship trusted data products faster.

Serverless, free-tier data stack with dlt + dbt core.

· 8 min read

The problem, the builder and tooling

The problem: My partner and I are considering buying a property in Portugal. There is no reference data for the real estate market here - how many houses are being sold, for what price? Nobody knows except the property office and maybe the banks, and they don’t readily divulge this information. The only data source we have is Idealista, which is a portal where real estate agencies post ads.

Unfortunately, there are significantly fewer properties than ads - it seems many real estate companies re-post the same ad that others do, with intentionally different data and often misleading bits of info. The real estate agencies do this so the interested parties reach out to them for clarification, and from there they can start a sales process. At the same time, the website with the ads is incentivised to allow this to continue as they get paid per ad, not per property.

The builder: I’m a data freelancer who deploys end to end solutions, so when I have a data problem, I cannot just let it go.

The tools: I want to be able to run my project on Google Cloud Functions due to the generous free tier. dlt is a new Python library for declarative data ingestion which I have wanted to test for some time. Finally, I will use dbt Core for transformation.

Deprecation of dbt Server

· 2 min read
Roxi Dahlke

Summary

We’re announcing that dbt Server is officially deprecated and will no longer be maintained by dbt Labs going forward. You can continue to use the repository and fork it for your needs. We’re also looking for a maintainer of the repository from our community! If you’re interested, please reach out by opening an issue in the repository.

Why are we deprecating dbt Server?

At dbt Labs, we are continually working to build rich experiences that help our users scale collaboration around data. To achieve this vision, we need to take moments to think about which products are contributing to this goal, and sometimes make hard decisions about the ones that are not, so that we can focus on enhancing the most impactful ones.

dbt Server previously supported our legacy Semantic Layer, which was fully deprecated in December 2023. In October 2023, we introduced the GA of the revamped dbt Semantic Layer with significant improvements, made possible by the acquisition of Transform and the integration of MetricFlow into dbt.

The dbt Semantic Layer is now fully independent of dbt Server and operates on MetricFlow Server, a powerful new proprietary technology designed for enhanced scalability. We’re incredibly excited about the new updates and encourage you to check out our documentation, as well as this blog on how the product works.

The deprecation of dbt Server and updates to the Semantic Layer signify the evolution of the dbt ecosystem towards more focus on in product and out-of-the-box experiences around connectivity, scale, and flexibility. We are excited that you are along with us on this journey.