Blog
Analytics2025-12-2610 min

How to Build Analytics Data Models With dbt for Marketing and Product Teams

dbt transforms raw analytics data into clean, reliable datasets. Here's the starter guide for marketing and product analytics models.Step-by-step methodology with tool comparisons and integration p...

Marketing and product teams generate enormous amounts of data. Website events, ad platform metrics, CRM records, product usage logs, email engagement data, and customer support tickets all stream into warehouses and databases in raw, messy formats that are unusable for actual decision-making. The gap between raw data and actionable insight is the transformation layer, and in 2026, dbt (data build tool) has become the standard way to build it.

This guide walks through building analytics data models with dbt specifically for marketing and product teams. Not for data engineers who already know SQL and version control, but for the marketing analyst who has been writing one-off queries and the product manager who has been asking the data team for the same report every week. dbt turns those ad-hoc queries into reliable, tested, documented models that update automatically and that everyone on the team can trust.

TL;DR
  • dbt transforms raw data in your warehouse into clean, tested, documented models using SQL and version control. Think of it as the missing layer between raw event data and your dashboards.
  • The staging-intermediate-mart pattern organizes your models into three layers: staging (clean raw data), intermediate (join and transform), and marts (business-ready datasets for specific teams).
  • Marketing teams should build models for attribution, campaign performance, funnel conversion, and customer journey analysis. Product teams should build models for feature adoption, retention cohorts, and activation tracking.
  • The most common mistake is skipping tests and documentation. A model without tests is a liability, not an asset, because you will not know when upstream data changes break your calculations.

Why dbt and Why Now

Before dbt, analytics teams built transformation pipelines using one of three approaches. The first was stored procedures and scheduled SQL scripts: functional but impossible to version control, test, or document properly. The second was ETL tools like Informatica or Talend: powerful but expensive and requiring specialized skills. The third was the analyst spreadsheet: export data, manipulate it in Excel or Google Sheets, paste it into a presentation. This approach is error-prone, non-reproducible, and scales to exactly one person.

dbt replaced all three by applying software engineering practices to SQL. Your transformations are SQL files stored in a Git repository. You write tests that validate your data assumptions. You document each model with descriptions that appear in an auto-generated data catalog. You define dependencies between models so they execute in the correct order. And everything runs inside your data warehouse, so there is no data movement, no ETL server to maintain, and no additional infrastructure.

The reason to adopt dbt now, even if your analytics setup seems to be working, is that the cost of not having a transformation layer compounds. Every ad-hoc query that becomes a dashboard without going through a tested model is a future data trust issue. Every metric definition that lives in one person's head rather than in version-controlled SQL is a single point of failure. Every time two people calculate the same metric differently because they wrote slightly different queries, your organization loses confidence in its data. dbt solves these problems systematically.

42K+
dbt projects
active on dbt Cloud
85%
of data teams
use dbt or plan to adopt
3.2x
faster reporting
after dbt implementation

Sources: dbt Labs 2025 State of Analytics Engineering, Atlan Data Survey

Core Concepts for Non-Engineers

If you have written SQL queries before, you already know 80% of what you need to use dbt. The remaining 20% is the framework that turns those queries into production-grade data models. Here are the core concepts.

Models

A dbt model is a SQL file that defines a transformation. Each file contains a SELECT statement that pulls from raw data or other models and produces a clean dataset. When you run dbt, it executes these SELECT statements and materializes the results as tables or views in your warehouse. If you have ever written a SQL query and wished it would run automatically every day and always produce the same results, that query is a dbt model waiting to be formalized.

Sources

Sources are references to the raw data tables that already exist in your warehouse. Instead of hardcoding table names in your SQL, you define sources in a YAML file and reference them with the source() function. This abstraction means that if your raw data table name changes (because you switched ETL tools, for example), you update one YAML file instead of every query that referenced that table.

Tests

Tests are assertions about your data that dbt validates automatically. The four built-in tests check for uniqueness (is each row unique by this column?), not-null (are there missing values?), accepted values (does this column only contain expected values?), and relationships (does every value in this column exist in another table?). You can also write custom tests for business logic: "total revenue should never be negative" or "every customer should have at least one event." Tests run after every dbt build and alert you when data assumptions are violated.

Materializations

When dbt executes a model, it needs to decide how to store the result. The four materialization options are: view (a virtual table that re-executes the query each time it is read, good for simple transformations), table (a physical table that is rebuilt on each run, good for complex transformations that are read frequently), incremental (appends new data to an existing table without rebuilding, good for large event tables), and ephemeral (a CTE that is inlined into downstream models, good for lightweight helper transformations).

Start with Views, Optimize Later
When building your first dbt project, materialize everything as views. Views are fast to build and take no storage space. Only switch to tables or incremental models when query performance becomes an issue. Premature materialization optimization adds complexity without benefit. Most marketing and product models work perfectly well as views until you are processing tens of millions of rows.

The Three-Layer Architecture

The standard dbt project follows a three-layer architecture that separates concerns and creates a clean dependency chain. This pattern has emerged as the community standard because it balances simplicity with maintainability.

dbt Model Layers

1
Staging Layer

One staging model per source table. Responsible for renaming columns to a consistent convention, casting data types, filtering out obviously bad data, and applying basic transformations. Staging models are the only models that reference raw sources directly. Everything downstream references staging models.

2
Intermediate Layer

Models that join, aggregate, and transform staging models into more useful shapes. This is where you calculate metrics, join user data with event data, build session models from raw events, and create the building blocks that marts will consume. Intermediate models are not exposed to end users.

3
Marts Layer

Business-ready datasets designed for specific teams or use cases. A marketing mart might contain campaign performance by day. A product mart might contain feature adoption by user cohort. A finance mart might contain MRR by customer segment. Marts are what your BI tool connects to.

Directory Structure

Your dbt project directory should mirror the three-layer architecture. Under the models directory, create subdirectories for staging, intermediate, and marts. Within staging, organize by source system (for example, staging/stripe, staging/hubspot, staging/segment). Within marts, organize by business domain (for example, marts/marketing, marts/product, marts/finance). This structure makes it immediately clear where to find and where to add models.

Naming conventions matter. Staging models should be prefixed with stg_ (stg_stripe__charges, stg_hubspot__contacts). Intermediate models should be prefixed with int_ (int_user_sessions, int_campaign_touchpoints). Mart models should be prefixed with the domain (marketing_campaign_performance, product_feature_adoption). Consistent naming eliminates ambiguity about what a model does and where it sits in the dependency chain.

Building Marketing Data Models

Marketing teams typically need four categories of data models: attribution, campaign performance, funnel analysis, and customer journey. Here is how to build each one.

Attribution Model

The attribution model joins website session data with conversion events and assigns credit to marketing touchpoints. Start with the simplest version: first-touch attribution. The staging layer cleans your web session data (stg_segment__pages, stg_segment__identifies) and your conversion events (stg_stripe__charges or stg_hubspot__deals). The intermediate layer builds a user timeline by joining sessions with conversions and identifying the first touchpoint for each converter (int_user_first_touch). The mart layer aggregates by channel and campaign to produce the final attribution report (marketing_attribution_by_channel).

Once first-touch attribution is working and tested, extend it. Add last-touch attribution by identifying the final touchpoint before conversion. Add linear attribution by distributing credit equally across all touchpoints. Add time-decay attribution by weighting touchpoints closer to conversion more heavily. Each attribution model is an additional mart that uses the same intermediate timeline model, which means you build the hard part once and create multiple views of it.

The key test for your attribution model is that the total attributed revenue across all channels should equal your actual total revenue. If it does not, you have an attribution gap caused by conversions with no tracked touchpoint (direct traffic, word of mouth, dark social). Track the size of this gap and report it honestly rather than distributing unattributed revenue across channels.

Attribution Reality Check
No attribution model is objectively correct. First-touch over-credits awareness channels. Last-touch over-credits conversion channels. Multi-touch models require assumptions about weight distribution that are ultimately arbitrary. The value of building attribution in dbt is not perfect accuracy but consistent methodology. When everyone uses the same model and understands its assumptions, attribution becomes a useful planning tool rather than a political argument.

Campaign Performance Model

The campaign performance model consolidates spending and results across all paid channels into a single mart. This is valuable because every ad platform reports metrics differently, and comparing performance across Google Ads, LinkedIn Ads, and Meta Ads requires normalizing the data into a common schema.

Build staging models for each ad platform (stg_google_ads__campaigns, stg_linkedin_ads__campaigns, stg_meta_ads__campaigns). Each staging model should output the same columns: date, platform, campaign_name, campaign_id, impressions, clicks, spend, and any platform-specific metrics. The intermediate model unions these staging models into a single table (int_paid_campaigns_unioned). The mart model joins with your conversion data to add pipeline and revenue attribution per campaign (marketing_campaign_roi).

The test suite for campaign performance models should include: spend totals by platform match your actual invoices (within 2%, to account for timing differences), no campaigns have negative spend, every campaign has an associated platform, and daily spend amounts are within reasonable bounds (no single day exceeds 5x the daily average, which would indicate a data error). These tests catch data pipeline issues before they corrupt your dashboards.

Funnel Conversion Model

The funnel model tracks conversion rates between defined stages: visitor to signup, signup to activated, activated to paid, paid to expanded. Each stage transition is calculated from event data. The staging layer cleans your event stream (stg_segment__tracks). The intermediate layer identifies when each user first reached each funnel stage (int_user_funnel_stages). The mart model calculates conversion rates by cohort, channel, and time period (marketing_funnel_conversion).

The most important design decision in the funnel model is defining your stages precisely. "Activated" is not a self-evident term. Define it as a specific event or combination of events: "completed onboarding AND used core feature at least once within 7 days." Document this definition in your dbt model's description field so that everyone who uses the mart knows exactly what "activated" means. Ambiguous stage definitions are the number one cause of metrics disagreements between teams.

Skip the modeling, get the insights

OSCOM Analytics builds attribution, funnel, and campaign models automatically from your connected data sources. No SQL, no dbt, no warehouse management.

See it in action

Building Product Data Models

Product teams need data models that answer questions about user behavior, feature adoption, and retention. These models tend to be more complex than marketing models because they deal with event streams rather than aggregate metrics.

Feature Adoption Model

The feature adoption model tracks which features each user has used, when they first used each feature, and how frequently they return to each feature. Start by defining your feature taxonomy: a mapping from raw event names to human-readable feature categories. The event "dashboard_chart_created" maps to the feature "Dashboard Builder." The event "report_exported" maps to the feature "Reporting." This mapping lives in a dbt seed file (a CSV that dbt loads as a table), making it easy for product managers to update without touching SQL.

The staging model cleans your event stream. The intermediate model joins events with the feature taxonomy and calculates per-user feature adoption metrics: first use date, total uses, days since last use, and usage frequency (int_user_feature_adoption). The mart model aggregates by feature to show adoption rates, retention curves per feature, and the correlation between feature adoption and overall retention (product_feature_adoption).

The most valuable analysis this model enables is identifying which features correlate with retention. If users who adopt Feature X within their first 14 days retain at 85% while users who do not retain at 55%, you have a strong signal about where to focus onboarding. This analysis is trivial once the model exists and nearly impossible to do reliably from ad-hoc queries.

Retention Cohort Model

The retention cohort model is the most important product analytics model. It groups users by signup date (weekly or monthly cohorts) and tracks what percentage of each cohort remains active over time. "Active" needs a precise definition: logged in, performed a core action, or generated a billable event. The choice depends on your product.

The intermediate model (int_user_activity_periods) determines whether each user was active in each time period (week or month) after signup. The mart model (product_retention_cohorts) pivots this into the classic retention matrix where rows are cohorts and columns are periods since signup. This matrix is the foundation for every retention analysis: comparing cohorts, identifying the natural retention floor, and measuring the impact of product changes on retention.

Build this model as an incremental materialization because it processes your entire event history. Each run should only process the new events since the last run, not recompute the entire history. This keeps run times manageable even with millions of events. Add a test that verifies no cohort has more active users in a later period than in an earlier period (retention cannot increase over time for a fixed cohort unless you have a data error).

Activation Tracking Model

The activation model tracks whether each user has completed the defined activation criteria within a specified time window. This model is the bridge between acquisition and retention: it measures whether users who signed up actually experienced your product's core value.

Define activation as a combination of events and a time window. For example: "completed onboarding wizard AND created at least one project AND invited at least one team member, all within 14 days of signup." The intermediate model evaluates each criterion per user (int_user_activation_criteria). The mart model scores each user as activated or not and calculates activation rates by cohort, channel, and segment (product_activation_rates).

The activation model should also track partial activation: users who completed some criteria but not all. This identifies where in the activation sequence users drop off, which tells you exactly where to invest in onboarding improvements. If 80% of users complete onboarding but only 40% create a project, the project creation step is your activation bottleneck.

The dbt Seed Trick for Definitions
Store your business definitions (what counts as an "active" user, what the activation criteria are, which events map to which features) in dbt seed files rather than hardcoding them in SQL. Seed files are CSVs that dbt loads as tables. This means product managers and analysts can update definitions by editing a spreadsheet, reviewing the change in a pull request, and deploying it without touching any SQL. It also means every definition change is version-controlled and auditable.

Testing Strategy

Testing is what separates a dbt project from a collection of SQL scripts. Without tests, you discover data problems when someone notices a number on a dashboard looks wrong, which could be days or weeks after the problem started. With tests, you discover problems within hours and can fix them before they affect decisions.

Schema Tests

Apply schema tests to every model. At minimum, every model should have a unique test on its primary key and not-null tests on columns that should never be null. For dimension tables, add accepted_values tests on categorical columns (status should only be "active," "churned," or "trial"). For foreign keys, add relationships tests to verify referential integrity. These tests take 5 minutes to add and catch the most common data quality issues.

Data Tests

Data tests (custom SQL queries that return rows when the test fails) validate business logic that schema tests cannot express. Examples: revenue should never be negative, conversion rate should be between 0 and 1, the number of events per user per day should not exceed a reasonable maximum (which catches bot traffic or tracking bugs), and the total attributed revenue should equal the total actual revenue within a tolerance threshold.

Freshness Tests

Source freshness tests verify that your raw data is being loaded on schedule. If your Segment events are supposed to load every hour and the most recent event is 6 hours old, your pipeline is broken and every downstream model is stale. Configure freshness checks for every source with warn_after and error_after thresholds. This is the first line of defense against silent pipeline failures.

Documentation That People Actually Read

dbt generates a documentation website from the descriptions you add to your models and columns. This documentation is valuable only if it answers the questions that people actually ask. Here is what to document for each model.

Model description: What business question does this model answer? Who uses it? What are its key assumptions? Example: "This model calculates weekly retention cohorts for all users who signed up after January 2024. A user is considered 'active' in a given week if they performed at least one core action (defined in the feature_taxonomy seed). This model is used by the Product team's retention dashboard and the weekly product review."

Column descriptions: For every column in your mart models, write a one-sentence description that a non-technical person can understand. "cohort_week: The Monday date of the week in which the user signed up." "retention_rate_week_4: The percentage of users in this cohort who were active in their 4th week after signup." These descriptions appear in the auto-generated docs and eliminate the "what does this column mean?" questions.

Metric definitions: For each metric calculated in your marts, document the exact formula and any edge cases. "CAC is calculated as total sales and marketing spend for the period divided by the number of new customers acquired in the period. Marketing spend includes all paid channel spend from the campaign_performance model plus estimated organic content costs from the content_costs seed. Customers acquired in a period are those whose first payment date falls within the period, not their signup date."

Implementation Roadmap

Building a dbt project from scratch can feel overwhelming. Here is a phased approach that delivers value at each stage and builds complexity gradually.

dbt Implementation Phases

1
Foundation (Weeks 1-2)

Set up the dbt project, connect to your warehouse, and build staging models for your 3-5 most important data sources (event tracking, CRM, billing). Add basic schema tests. Run dbt manually to verify everything works. At this point you have clean, tested versions of your raw data.

2
Core Models (Weeks 3-4)

Build the intermediate and mart models for your single most important analytics question. If that is 'what is our funnel conversion by channel,' build the funnel conversion mart. If it is 'what is our retention by cohort,' build the retention cohort mart. Ship one complete, tested, documented mart before building others.

3
Expansion (Weeks 5-8)

Add 2-3 more mart models covering your next-priority analytics questions. Add more staging models as you incorporate additional data sources. Refine your testing strategy based on what broke during weeks 3-4. Connect your BI tool to the mart models.

4
Automation (Weeks 9-10)

Set up scheduled dbt runs (daily or hourly depending on freshness needs). Configure alerting for test failures. Build a monitoring dashboard that shows model run times, test results, and source freshness. At this point, your analytics pipeline is production-grade.

5
Scale (Ongoing)

Add models as new analytics questions arise. Optimize materialization for slow models. Build the dbt docs site and train team members to use it. Establish a pull request review process for model changes. The dbt project is now a living system that grows with your analytics needs.

The Most Common dbt Mistake
The most common mistake is trying to model everything at once. Teams spend 8 weeks building staging models for every data source, intermediate models for every possible join, and mart models for every conceivable question, and then never ship anything because the project is too large to test and validate. Build one complete vertical slice (source to staging to mart) first, ship it, get feedback, and then build the next one. Incremental delivery beats comprehensive planning every time.

dbt Cloud vs. dbt Core

dbt comes in two flavors. dbt Core is the open-source command-line tool. dbt Cloud is the managed service with a web IDE, job scheduling, CI/CD, and documentation hosting. The choice depends on your team's capabilities and budget.

Choose dbt Core if: You have a data engineer or analytics engineer comfortable with the command line, Git, and CI/CD pipelines. You are cost-sensitive (dbt Core is free). You want full control over your infrastructure. You are already using a CI/CD tool (GitHub Actions, GitLab CI) that can run dbt.

Choose dbt Cloud if: Your team is primarily analysts who are comfortable with SQL but not the command line. You want scheduling, monitoring, and documentation hosting without building it yourself. You want the web-based IDE for collaborative development. You are willing to pay $100-500/month for the convenience. For most marketing and product analytics teams without a dedicated data engineer, dbt Cloud is the right choice because it removes the infrastructure management overhead.

Connecting dbt to Your BI Tool

The final mile of the dbt pipeline is connecting your mart models to a visualization layer. The goal is that every dashboard and report reads from dbt marts, never from raw data. This ensures consistency: if two dashboards show the same metric, they pull from the same tested model and always agree.

For Looker users, dbt's integration is the tightest. You can generate LookML from your dbt models, and Looker explores map directly to dbt marts. For Metabase, Preset, or Tableau users, point the BI tool at the schema where dbt materializes your marts. Create one data source per mart model. Name the data sources to match the mart model names so the connection between dbt documentation and BI dashboards is obvious.

The discipline that makes this work is the rule that no one writes raw SQL in the BI tool. If an analyst needs a metric that does not exist in a mart, they add it to the dbt model, test it, document it, merge it, and then use it in the BI tool. This adds a small amount of friction but eliminates the "which number is right" problem that plagues organizations where different people write different queries against raw data and get different answers.

Analytics without the data modeling

OSCOM Analytics pre-builds the marketing and product models that would take weeks to build in dbt. Connect your data sources and start analyzing in minutes, not months.

Connect your sources

Key Takeaways

  • 1dbt turns ad-hoc SQL queries into tested, documented, version-controlled data models. It is the transformation layer between your raw data warehouse and your dashboards.
  • 2Use the three-layer architecture: staging (clean raw data), intermediate (join and transform), and marts (business-ready datasets). Only marts should be exposed to BI tools.
  • 3Marketing teams should prioritize attribution, campaign performance, and funnel conversion models. Product teams should prioritize feature adoption, retention cohorts, and activation tracking.
  • 4Test everything. Schema tests on every model (unique, not null, accepted values). Data tests for business logic. Freshness tests for source pipelines. Tests are what make your models trustworthy.
  • 5Document for humans. Every model needs a description that answers: what question does this answer, who uses it, and what are the key assumptions? Every column in mart models needs a plain-English description.
  • 6Build incrementally: one complete vertical slice (source to mart) every 2 weeks, not a comprehensive project that takes 3 months to ship. Ship value early and expand.
  • 7The no-raw-SQL-in-BI rule ensures consistency. If a metric does not exist in a dbt mart, add it to the model through a pull request, not as a custom query in the dashboard.

Analytics engineering for growth teams

Data modeling patterns, dbt tips, and transformation strategies for marketing and product analysts who want reliable, trustworthy data.

The shift from ad-hoc queries to modeled, tested, documented data is the single highest-leverage analytics improvement most marketing and product teams can make. It does not require hiring a data engineering team or adopting a complex infrastructure. It requires committing to the discipline that every metric used for decision-making flows through a tested model with a documented definition. dbt makes this discipline achievable by bringing software engineering practices to SQL in a way that analysts can adopt without becoming engineers. Start with one model, test it, document it, connect it to a dashboard, and build from there. Within two months you will wonder how you ever made decisions from ad-hoc queries.

Prove what's working and cut what isn't

Oscom connects GA4, Kissmetrics, and your CRM so you can tie every marketing activity to revenue in one dashboard.