Data Engineering Tidbits - Setting Up Your First dbt Project

Setting Up Your First dbt Project

As part of my journey towards becoming a senior data engineer, I started a deep dive into dbt (data build tool). Today, I want to share my experience with the initial setup process.

The Setup Process

The beauty of dbt is that it handles most of the boilerplate setup for you like most tools now a days. Let's walk through it step by step.

Step 1: Installing dbt

First things first, install dbt with the BigQuery adapter:

pip install dbt-bigquery

Step 2: Initialize Your Project

Run the following command:

dbt init your_project_name

This command is where the magic happens. dbt will create a new directory with your project name and automatically generate two crucial configuration files.

Step 3: Understanding the Configuration Files

  1. dbt_project.yml
    This file lives in your project root directory and acts as your project's command center. Here's what mine looks like:
name: 'dbt_flights'
version: '1.0.0'

profile: 'dbt_flights'

model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]

clean-targets:
  - "target"
  - "dbt_packages"


models:
  dbt_flights:
    example:
      +materialized: view
  1. profiles.yml
    This file lives by default in ~/.dbt/profiles.yml and contains your connection settings. Here's a typical BigQuery setup:
my_bigquery_profile:
  target: dev
  outputs:
    dev:
      type: bigquery
      method: service_account  # or oauth
      project: your-gcp-project-id
      dataset: your_dataset
      threads: 1
      timeout_seconds: 300
      location: US  # or your preferred region
      job_execution_timeout_seconds: 300
      job_retries: 1
      keyfile: /path/to/service-account.json  # if using service account

Step 4: Configuration Deep Dive

Let's break down the key BigQuery-specific settings you'll need to understand.

Authentication Options:

  • method: oauth - Perfect for local development
  • method: service_account - Use this for CI/CD pipelines and production deployments

If you're using a service account, don't forget to specify the keyfile path

My recommendation is start with oauth for local development, it's simpler.

Location Configuration

Must match your actual BigQuery dataset location

  • location: Your BigQuery dataset location

Step 5: Testing Your Setup

Once everything is configured, run:

dbt debug

This command will verify your connection and configuration settings.

dbt debug 1
dbt debug 2

Quick Recap

Here's what we've covered:

dbt init
  • creates your project structure
  • Two main config files: dbt_project.yml and profiles.yml
  • Configure your BigQuery connection (auth method)
  • Test everything with:
dbt debug

What's Next?

Now that you have your project set up, you're ready to start building models! In my next post, I'll cover how to create your first dbt models and some best practices I've learned along the way.