Setting Up Your First dbt Project
As part of my journey towards becoming a senior data engineer, I started a deep dive into dbt (data build tool). Today, I want to share my experience with the initial setup process.
The Setup Process
The beauty of dbt is that it handles most of the boilerplate setup for you like most tools now a days. Let's walk through it step by step.
Step 1: Installing dbt
First things first, install dbt with the BigQuery adapter:
pip install dbt-bigquery
Step 2: Initialize Your Project
Run the following command:
dbt init your_project_name
This command is where the magic happens. dbt will create a new directory with your project name and automatically generate two crucial configuration files.
Step 3: Understanding the Configuration Files
- dbt_project.yml
This file lives in your project root directory and acts as your project's command center. Here's what mine looks like:
name: 'dbt_flights'
version: '1.0.0'
profile: 'dbt_flights'
model-paths: ["models"]
analysis-paths: ["analyses"]
test-paths: ["tests"]
seed-paths: ["seeds"]
macro-paths: ["macros"]
snapshot-paths: ["snapshots"]
clean-targets:
- "target"
- "dbt_packages"
models:
dbt_flights:
example:
+materialized: view
- profiles.yml
This file lives by default in ~/.dbt/profiles.yml and contains your connection settings. Here's a typical BigQuery setup:
my_bigquery_profile:
target: dev
outputs:
dev:
type: bigquery
method: service_account # or oauth
project: your-gcp-project-id
dataset: your_dataset
threads: 1
timeout_seconds: 300
location: US # or your preferred region
job_execution_timeout_seconds: 300
job_retries: 1
keyfile: /path/to/service-account.json # if using service account
Step 4: Configuration Deep Dive
Let's break down the key BigQuery-specific settings you'll need to understand.
Authentication Options:
- method: oauth - Perfect for local development
- method: service_account - Use this for CI/CD pipelines and production deployments
If you're using a service account, don't forget to specify the keyfile path
My recommendation is start with oauth for local development, it's simpler.
Location Configuration
Must match your actual BigQuery dataset location
- location: Your BigQuery dataset location
Step 5: Testing Your Setup
Once everything is configured, run:
dbt debug
This command will verify your connection and configuration settings.
Quick Recap
Here's what we've covered:
dbt init
- creates your project structure
- Two main config files: dbt_project.yml and profiles.yml
- Configure your BigQuery connection (auth method)
- Test everything with:
dbt debug
What's Next?
Now that you have your project set up, you're ready to start building models! In my next post, I'll cover how to create your first dbt models and some best practices I've learned along the way.