HomeBack to recipes
RecipesDataWeb Scraping to Database

TL;DR

Web Scraping to Database is a data workflow that chains Firecrawl + Supabase to automate a common task. Schedule a Firecrawl scrape of any website and store the structured results directly in a Supabase table for analysis. Once configured, it saves ~12 hours/week of manual competitive research, plus elimination of brittle custom scrapers and runs through Claude Code, Cursor, Windsurf or any MCP-compatible AI agent.

🔥🟢
DataIntermediate

Web Scraping to Database

Schedule a Firecrawl scrape of any website and store the structured results directly in a Supabase table for analysis.

15 min setup, continuous data collection2 MCPs requiredSaves ~12 hours/week of manual competitive research, plus elimination of brittle custom scrapers

How it works

🔥Firecrawl
🟢Supabase
Automated
1Schedule or trigger Firecrawl job2Scrape target URLs with JS rendering3Extract structured fields via schema+2 more steps
Hostable — runs in your browser2/2 MCPs hosted

Run with MCPizy

New

Execute this recipe in your browser — no local install, no Claude Code. Streams results live.

Whitelisted MCPs: perplexity, notion, anthropic, openai, tavily, firecrawl, coingecko, stripe, slack, github, gitlab, linear, resend, sendgrid, elevenlabs, shopify, sentry, posthog, supabase-mcp, context7, deepwiki~4k tokens  ·  ~$0.012 est.

Why this combo?

Firecrawl handles the hard parts of scraping — JS rendering, pagination, rate limiting — and returns clean structured data. Supabase gives you a queryable database to accumulate that data over time. Together they replace a brittle custom scraper + manual CSV import workflow.

Without this workflow

Write a custom scraper that breaks every time the site updates, export CSV, import into a database manually, fix encoding issues.

With MCPizy

Configure Firecrawl once, data flows into Supabase on schedule. Query it with SQL immediately.

Business value

Concrete ROI — not marketing fluff.

Time saved

~12 hours/week of manual competitive research, plus elimination of brittle custom scrapers

  • Replaces a full-time data engineer maintaining scrapers ($120-180k/year) with a declarative schema
  • Pricing intelligence updates daily — catch competitor price drops and reprice within hours, not weeks
  • Zero downtime on site redesigns: Firecrawl's rendering handles JS changes your custom scraper breaks on
  • Historical data accumulates in SQL — enables trend analysis that one-off scrapes can never provide

Workflow steps

  1. 1
    Schedule or trigger Firecrawl job
  2. 2
    Scrape target URLs with JS rendering
  3. 3
    Extract structured fields via schema
  4. 4
    Deduplicate against existing records
  5. 5
    Upsert rows into Supabase table

Use cases

  • Scrape competitor pricing pages daily into a queryable Supabase table
  • Monitor job boards and store new listings for talent pipeline tracking
  • Aggregate product reviews from multiple sites into one database
  • Track changes in public datasets by scraping and diffing over time

MCPs required

🔥

Firecrawl

Firecrawl MCP Server

View
🟢

Supabase

Supabase MCP Server

View

Agent prompt (copy into Claude Code)

This prompt is the workflow. Paste into Claude Code, Cursor, or Windsurf.

You are a scraping-to-database agent. Runs on a schedule defined in scrape-targets.yaml.

For each target (url, schema, supabase_table):
1. Call firecrawl.scrape(url=target.url, formats=["json"], json_schema=target.schema, render_js=true) to get structured rows
2. For each row, compute a stable hash(url + primary_key) to use as upsert key
3. Call supabase.execute_sql with parameterized UPSERT:
   INSERT INTO ${target.supabase_table} (...) VALUES (...) ON CONFLICT (hash) DO UPDATE SET ...
4. Track diff count: rows_inserted, rows_updated, rows_unchanged
5. If rows_updated > 0, call supabase.execute_sql to insert a changelog row in scrape_log

On rate-limit or 5xx from Firecrawl, retry with exponential backoff (3 attempts). Report row counts only.

Trigger & credentials

How this workflow fires and what env vars you need.

.env.example
ScheduledTrigger
0 */6 * * *  # every 6 hours
🔥Firecrawl· 1 var
FIRECRAWL_API_KEYGet key

Firecrawl API key

e.g. fc-...

🟢Supabase· 2 vars
SUPABASE_URLGet key

Project URL

e.g. https://abcd.supabase.co

SUPABASE_SERVICE_ROLE_KEYGet key

Service role key (server-side only, bypasses RLS)

e.g. eyJhbGci...

One-command deploy

Install everything — MCPs, prompt, env template — in a single call.

$ mcpizy recipe install firecrawl-supabase-scraping

✓ Installs all 2 MCP servers
✓ Writes prompt to ~/.mcpizy/prompts/firecrawl-supabase-scraping.md
✓ Generates .env.example in current directory
✓ Ready to paste into Claude Code

Requires mcpizy CLI v1.1+ — install via npm i -g mcpizy.

Quick install (MCPs only)

15 min setup, continuous data collection
$ mcpizy install firecrawl && mcpizy install supabase

More Data recipes

🔍🟢

Search Results Indexing

Run Tavily searches on scheduled topics and index the results in Supabase for trend analysis and content research.

🔴🟢

Cache Invalidation Pipeline

When a Supabase row changes, the corresponding Redis cache key is automatically invalidated to keep your API fresh.

🕸️🐙

Knowledge Graph from Code

Parse your GitHub repos and build a Neo4j knowledge graph of files, functions, imports, and authors for code intelligence.

🦆☁️

Data Lake Queries

Query Parquet files directly from S3 using DuckDB without any ETL. Results are returned in seconds for ad-hoc analytics.

Frequently asked questions

What is this workflow?

Web Scraping to Database is a data automation that uses Firecrawl + Supabase together via the Model Context Protocol. Schedule a Firecrawl scrape of any website and store the structured results directly in a Supabase table for analysis.

How long does setup take?

Setup takes around 15 min setup, continuous data collection. You install the required MCP servers with `mcpizy install firecrawl && mcpizy install supabase`, connect your accounts, and the workflow is ready to run.

How much time does this workflow save?

Once running, this workflow saves ~12 hours/week of manual competitive research, plus elimination of brittle custom scrapers. The concrete business value: Replaces a full-time data engineer maintaining scrapers ($120-180k/year) with a declarative schema; Pricing intelligence updates daily — catch competitor price drops and reprice within hours, not weeks.

Which MCP servers do I need for this?

You need 2 MCP servers: Firecrawl (mcpizy install firecrawl), Supabase (mcpizy install supabase). All are installable in one command via the MCPizy CLI and configured in your `.claude.json` or `.cursor/mcp.json`.

Does this work with Claude Code, Cursor, and Windsurf?

Yes. The workflow runs with any MCP-compatible AI agent — Claude Code, Claude Desktop, Cursor, Windsurf, VS Code with Copilot, and custom agents built on the MCP SDK. The MCP servers are identical across clients; only the config file path (`.claude.json` vs `.cursor/mcp.json`) changes.

Start building this workflow

Install the required MCPs from the marketplace and automate this in 15 min setup.

$ mcpizy install firecrawl && mcpizy install supabase

🔥Install Firecrawl🟢Install Supabase

Free to install. Connect your accounts and this workflow runs itself.