The orchestration landscape in data engineering has shifted dramatically. For nearly a decade, Apache Airflow was the undisputed standard — the tool you reached for when you needed to schedule and monitor data pipelines. But Dagster has emerged as a serious challenger, and after running both in production across multiple companies, I believe the choice between them comes down to one fundamental question: do you think about your data platform in terms of tasks, or in terms of assets?
This isn't just a philosophical difference. It has real consequences for how you build, test, debug, and maintain your pipelines — especially when dbt is in the mix.
Airflow: The Battle-Tested Incumbent
Let's give credit where it's due. Airflow pioneered the idea of "workflows as code" and built an ecosystem that thousands of companies depend on. Its strengths are real:
- Massive community and ecosystem — thousands of operators, providers, and integrations. If a tool exists, there's probably an Airflow operator for it.
- Battle-tested at scale — companies like Airbnb, Uber, and Spotify have run Airflow at enormous scale for years.
- Flexible execution model — KubernetesExecutor, CeleryExecutor, and the newer task-level isolation options give you real control over how and where tasks run.
- Managed offerings — MWAA (AWS), Cloud Composer (GCP), and Astronomer mean you don't have to operate it yourself.
But Airflow has fundamental design decisions that create friction in modern data platforms:
- Task-centric, not data-centric. An Airflow DAG describes a sequence of operations — "run this script, then that script." It doesn't inherently know what data those scripts produce, what schema that data has, or whether it's fresh. You bolt that knowledge on after the fact.
- Testing is painful. Unit testing a DAG requires mocking execution contexts, connections, and variables. Integration testing often means running the full scheduler. Most teams end up with minimal test coverage for their most critical infrastructure.
- Local development is clunky. Running Airflow locally means spinning up a scheduler, webserver, and database.
astro dev starthelps, but it's still a heavyweight process compared to running a Python script. - Implicit dependencies. Task dependencies in Airflow are execution-order dependencies, not data dependencies. If Task B reads from a table that Task A writes to, you encode that as
A >> B. But nothing in the system actually validates that relationship — it's convention, not contract.
Dagster: The Asset-First Challenger
Dagster takes a fundamentally different approach. Instead of defining "what to run and in what order," you define "what data assets exist and how they're derived." This shift sounds subtle but it changes everything.
Software-Defined Assets
In Dagster, the core primitive is the Software-Defined Asset (SDA). An asset is a declaration: "this table/file/model exists, it depends on these upstream assets, and here's the code that produces it." The orchestrator infers the execution graph from the dependency declarations.
from dagster import asset
@asset
def raw_events():
"""Ingest raw event data from the source API."""
return fetch_events_from_api()
@asset(deps=[raw_events])
def cleaned_events(raw_events):
"""Clean and validate event data."""
return clean_and_validate(raw_events)
@asset(deps=[cleaned_events])
def event_metrics(cleaned_events):
"""Compute aggregate metrics from cleaned events."""
return compute_metrics(cleaned_events)
This is a profound shift. The orchestrator now understands your data — not just your tasks. It knows what each asset's schema looks like, when it was last materialized, and what depends on it.
Partitions and Backfills
Dagster's partition system is elegant. You define partition schemes (daily, monthly, custom) at the asset level, and Dagster handles the bookkeeping. Need to backfill last month's data? Select the partitions in the UI and go. In Airflow, backfills are notoriously tricky — airflow dags backfill has a long list of gotchas and edge cases that have bitten every team I've worked with.
Developer Experience
This is where Dagster pulls ahead decisively:
- Local development is trivial.
dagster devstarts a local instance in seconds. No database, no scheduler daemon — just your code and a UI. - Testing is first-class. Assets are regular Python functions. You can unit test them by calling them directly. Dagster provides test utilities for resources, IO managers, and sensors without requiring a running instance.
- Type checking and validation. Dagster's type system and config schema catch errors before execution, not during. You find out about misconfigurations at import time, not at 3 AM when a production DAG fails.
- Built-in observability. Asset lineage, materialization history, and freshness policies are core features, not plugins.
Why dbt Belongs with Dagster
Here's where my opinion gets strong. If you're running dbt — and in 2026, most modern data teams are — the choice becomes much clearer.
The Impedance Mismatch with Airflow
dbt is inherently asset-centric. Every dbt model is a declaration: "this table exists, it depends on these upstream models, and here's the SQL that produces it." dbt has its own DAG, its own dependency resolution, and its own testing framework.
When you put dbt inside Airflow, you're wrapping an asset-centric tool in a task-centric orchestrator. The typical pattern looks like this:
# Airflow: dbt as a black box
run_dbt_staging = BashOperator(
task_id="run_dbt_staging",
bash_command="dbt run --select staging.*",
)
run_dbt_marts = BashOperator(
task_id="run_dbt_marts",
bash_command="dbt run --select marts.*",
)
test_dbt = BashOperator(
task_id="test_dbt",
bash_command="dbt test",
)
run_dbt_staging >> run_dbt_marts >> test_dbt
You've lost all granularity. Airflow sees three tasks, not the dozens of models inside them. If one model in marts fails, you re-run all of marts. You can't see individual model freshness, lineage, or test results in the Airflow UI. The cosmos provider improves this by mapping dbt models to Airflow tasks, but it's fighting against Airflow's grain — adding complexity to bridge a fundamental mismatch.
The Natural Alignment with Dagster
Dagster's dagster-dbt integration treats dbt models as first-class Dagster assets. Each dbt model becomes a Software-Defined Asset automatically. The dependency graph is unified — dbt models sit alongside Python assets in a single lineage view.
from dagster import asset, AssetExecutionContext
from dagster_dbt import DbtCliResource, dbt_assets, DbtProject
my_dbt_project = DbtProject(project_dir="path/to/dbt_project")
@dbt_assets(manifest=my_dbt_project.manifest_path)
def my_dbt_models(context: AssetExecutionContext, dbt: DbtCliResource):
yield from dbt.cli(["build"], context=context).stream()
@asset(deps=[my_dbt_models])
def ml_features(cleaned_events):
"""Python asset that depends on dbt models."""
return build_features_from_dbt_output()
This is the key insight: dbt models and Python assets live in the same dependency graph. You get unified lineage, unified freshness tracking, and unified alerting. When a dbt model fails, Dagster knows exactly which downstream assets are affected — whether they're other dbt models or Python computations.
You can materialize individual dbt models, apply freshness policies to them, and partition them — all using the same mechanisms you use for Python assets. There's no impedance mismatch because both dbt and Dagster think in terms of assets.
When to Choose What
I'm opinionated, but I'm not dogmatic. Here's my honest decision framework:
Choose Airflow when:
- You have a large existing Airflow deployment with hundreds of DAGs and a team that knows it well. Migration cost is real.
- Your pipelines are primarily task-oriented — fire-and-forget API calls, file transfers, notifications — with little need for data lineage.
- You need a specific operator or integration that only exists in the Airflow ecosystem.
- You're on GCP and want tight Cloud Composer integration with other Google services.
Choose Dagster when:
- You're building a new data platform or have the opportunity to rearchitect. Starting fresh with Dagster is dramatically easier.
- dbt is a core part of your transformation layer. The integration is a genuine force multiplier.
- You care about testing and local development. Dagster's developer experience is in a different league.
- Your platform is asset-centric — you think in terms of "what tables and datasets exist" rather than "what scripts need to run."
- You want built-in data observability without stitching together separate tools.
Conclusion
The data orchestration landscape is no longer a one-horse race. Airflow remains a powerful, proven tool — but its task-centric model creates real friction with modern, asset-centric data stacks. Dagster's approach aligns naturally with how we think about data platforms in 2026: as collections of interconnected data assets with clear ownership, lineage, and quality guarantees.
If you're starting a new project and dbt is in your stack, I'd pick Dagster without hesitation. If you're maintaining a mature Airflow deployment, the migration calculus is more nuanced — but the direction is clear. The future of data orchestration is asset-first, and Dagster got there first.