TL;DR

The Data Pipeline Stack is 6 MCPs (Postgres, ClickHouse, DuckDB, AWS/S3, GitHub, Grafana) that cover OLTP, OLAP, local prototyping, lake, code, and monitoring. Stakeholder questions go from half-day to 5 minutes; pipeline debugging from hours to minutes. Free to mid-scale — you only pay for the underlying services.

🐘🟡🦆☁️🐙+1

Stack · 6 MCPs

The Data Pipeline Stack

ETL, warehouse, lake, and analytics — built for modern data teams

Install the whole stack

$ mcpizy install postgres clickhouse duckdb aws github grafana

One command installs and configures all 6 MCPs for Claude Code, Cursor, Windsurf, or any MCP-compatible client.

Why this stack?

Modern data stacks run on a predictable shape: Postgres as source-of-truth OLTP, ClickHouse (or Snowflake/BigQuery) as the warehouse, DuckDB for local prototyping, S3 as the lake, GitHub for dbt/Airflow/Dagster code, Grafana for monitoring. This stack is those six as MCPs. Stakeholder questions ('how many active users last month?') that used to take a half-day now take 5 minutes.

The win isn't replacing your BI tool — it's making the 'ad-hoc analysis' path 10x faster. When a PM asks for a breakdown, you don't open Looker or write a Jupyter notebook; you ask Claude, which introspects the warehouse schema, writes the SQL, runs it, and returns the answer with the code for review.

MCPs in this stack (6)

🐘

Postgres

Source-of-truth OLTP + dbt

View

🟡

ClickHouse

Warehouse / analytics OLAP

View

🦆

DuckDB

Local analytics & prototyping

View

☁️

AWS (S3)

Data lake storage

View

🐙

GitHub

dbt / Airflow / Dagster repos

View

📊

Grafana

Pipeline monitoring + BI

View

What this stack lets you do

Stakeholder question to answer

See recipe

1PM asks: 'weekly active growth by plan tier, last 8 weeks?'
2Claude introspects Postgres/ClickHouse schema via MCP
3Writes the CTE query and runs it
4Pivots result, generates ASCII chart
5Drafts dbt model `weekly_active_by_tier.sql`
6GitHub PR opened for review
7Posts chart + query to Slack for the PM

Data lake exploration with DuckDB

See recipe

1Raw JSON/Parquet files land in S3
2Claude mounts the S3 path via AWS MCP
3DuckDB MCP queries the files directly — no ingestion needed
4Profiling results returned (row count, nulls, distributions)
5Promising schema drafted into a dbt model

Pipeline monitoring dashboard

See recipe

1Postgres MCP queries pipeline metadata (row counts, freshness)
2Claude generates Grafana dashboard JSON
3Posted to Grafana via MCP
4Alerts wired to Slack for any table not updated in N hours

Estimated value

Replaces ~$600/mo of tooling (Mode, Hex seats, Looker admins, BigQuery admin console) for a 3-person data team. Biggest win: stakeholder response time drops 10x, so the team is no longer a bottleneck.

Frequently asked questions

What about Snowflake or BigQuery?

Both have community MCPs (Snowflake MCP, BigQuery MCP). Swap them for ClickHouse MCP if that's your warehouse. Same workflow, different connector.

Is DuckDB really useful if I have a warehouse?

Yes — for prototyping. DuckDB queries parquet/CSV files directly from S3 without ingestion. 10x faster iteration when you're sketching a transformation. Lift to dbt once the logic is stable.

Can Claude write dbt models?

Yes — it reads your existing model style, introspects the schema, generates the SQL, runs `dbt test`, and opens a PR. Most data engineers cut dbt authoring time by 50%.

Is it safe to run SQL on production Postgres?

Use a read-only replica for most queries. Postgres MCP supports multiple connections — point at the replica for exploration, at primary only for controlled writes behind confirmation.

Where does Airflow / Dagster fit?

Their code lives in GitHub (MCP covers it). Execution layer is separate — but the community Dagster MCP and Airflow MCP expose DAG state to Claude, letting you debug failed runs in-chat.

Other stacks

The SaaS Starter Stack

Everything to launch a B2B SaaS in a weekend

6 MCPs

The AI Agent Builder Stack

Search, scrape, memory, and cache — the infrastructure every agent needs

6 MCPs

The Content Ops Stack

Research, write, voice, publish — the content pipeline in 5 MCPs

5 MCPs

Install this stack

$ mcpizy install postgres clickhouse duckdb aws github grafana

🐘Postgres 🟡ClickHouse 🦆DuckDB ☁️AWS (S3)

Browse all MCPs

TL;DR

Why this stack?

Frequently asked questions

What about Snowflake or BigQuery?

Both have community MCPs (Snowflake MCP, BigQuery MCP). Swap them for ClickHouse MCP if that's your warehouse. Same workflow, different connector.

Is DuckDB really useful if I have a warehouse?

Yes — for prototyping. DuckDB queries parquet/CSV files directly from S3 without ingestion. 10x faster iteration when you're sketching a transformation. Lift to dbt once the logic is stable.

Can Claude write dbt models?

Yes — it reads your existing model style, introspects the schema, generates the SQL, runs `dbt test`, and opens a PR. Most data engineers cut dbt authoring time by 50%.

Is it safe to run SQL on production Postgres?

Use a read-only replica for most queries. Postgres MCP supports multiple connections — point at the replica for exploration, at primary only for controlled writes behind confirmation.

Where does Airflow / Dagster fit?

Their code lives in GitHub (MCP covers it). Execution layer is separate — but the community Dagster MCP and Airflow MCP expose DAG state to Claude, letting you debug failed runs in-chat.