HomeBack to recipes
RecipesDataKnowledge Graph from Code

TL;DR

Knowledge Graph from Code is a data workflow that chains Neo4j + GitHub to automate a common task. Parse your GitHub repos and build a Neo4j knowledge graph of files, functions, imports, and authors for code intelligence. Once configured, it saves ~15 hours/week on large refactors, plus de-risking of platform-wide changes and runs through Claude Code, Cursor, Windsurf or any MCP-compatible AI agent.

🕸️🐙
DataAdvanced

Knowledge Graph from Code

Parse your GitHub repos and build a Neo4j knowledge graph of files, functions, imports, and authors for code intelligence.

45 min setup, instant code intelligence queries2 MCPs requiredSaves ~15 hours/week on large refactors, plus de-risking of platform-wide changes

How it works

🕸️Neo4j
🐙GitHub
Automated
1Clone or fetch repo from GitHub2Parse AST for files, functions, and imports3Extract git blame for authorship+2 more steps

Partial support — 1 of 2 MCPs hostable

Hosted execution needs every MCP on the whitelist. Use the local CLI for this recipe until the missing MCPs are added.

Not yet hostable:

🕸️Neo4j
mcpizy recipe install neo4j-github-knowledge-graph

Why this combo?

GitHub holds your code history and authorship; Neo4j's graph model expresses code relationships (imports, calls, ownership) in a way a relational database cannot. Together they unlock queries that answer 'who understands this module' or 'what breaks if I change this function' in seconds.

Without this workflow

Manually trace import chains across files, ask around to find who owns a module, struggle to understand why a change broke something unrelated.

With MCPizy

Query the knowledge graph. See every caller of a function, every module it depends on, and who last touched each node — in one Cypher query.

Business value

Concrete ROI — not marketing fluff.

Time saved

~15 hours/week on large refactors, plus de-risking of platform-wide changes

  • De-risks major refactors: answers 'what breaks if I change X?' in a query, not 2 days of archaeology
  • Keeps domain knowledge alive after senior engineers leave — the graph survives the headcount
  • Routes PR reviews to the right expert automatically — cuts review cycle from 2 days to 4 hours
  • Identifies dependency hotspots that justify refactor investment with data, not vibes

Workflow steps

  1. 1
    Clone or fetch repo from GitHub
  2. 2
    Parse AST for files, functions, and imports
  3. 3
    Extract git blame for authorship
  4. 4
    Create nodes and relationships in Neo4j
  5. 5
    Query graph for dependency paths and hotspots

Use cases

  • Find all code paths that depend on a module before refactoring it
  • Identify who has the most context on a file for code review routing
  • Map circular dependency chains that slow down build times
  • Visualize how changes propagate through a large monorepo

MCPs required

🕸️

Neo4j

Neo4j MCP Server

View
🐙

GitHub

GitHub MCP Server

View

Agent prompt (copy into Claude Code)

This prompt is the workflow. Paste into Claude Code, Cursor, or Windsurf.

You are a code-knowledge-graph agent. Runs weekly or on main-branch push.

Given a GitHub repo:
1. Call github.clone_or_pull(repo, path="./workspace") to fetch latest
2. Parse AST for all source files using language-appropriate parser (ts-morph, jedi, etc.)
3. For each file extract: File node, Function nodes, Class nodes, Import edges
4. Call github.git_blame for each function definition line to get author + last_modified
5. Batch-write to Neo4j via neo4j.execute_cypher with MERGE clauses:
   MERGE (f:File {path:$path}) MERGE (fn:Function {name:$name, file:$path}) MERGE (f)-[:CONTAINS]->(fn)
   MERGE (a:Author {email:$email}) MERGE (a)-[:LAST_TOUCHED]->(fn)
6. After ingest, run staple queries: cycle detection, hotspots (top 20 most-imported files), orphans, owner coverage
7. Post summary to GitHub Wiki or repo README badge

Idempotent — MERGE makes re-runs safe. Report node + edge counts.

Trigger & credentials

How this workflow fires and what env vars you need.

.env.example
ScheduledTrigger
0 2 * * 1  # every Monday at 02:00 UTC (or on push to main)
🕸️Neo4j· 3 vars
NEO4J_URI

Neo4j Bolt URI

e.g. bolt://localhost:7687

NEO4J_USERNAME

Neo4j username

e.g. neo4j

NEO4J_PASSWORD

Neo4j password

e.g. change-me

🐙GitHub· 2 vars
GITHUB_TOKENGet key

PAT with repo scope for private repos

e.g. ghp_...

GITHUB_REPO

Target repo to graph (owner/name)

e.g. acme/monorepo

One-command deploy

Install everything — MCPs, prompt, env template — in a single call.

$ mcpizy recipe install neo4j-github-knowledge-graph

✓ Installs all 2 MCP servers
✓ Writes prompt to ~/.mcpizy/prompts/neo4j-github-knowledge-graph.md
✓ Generates .env.example in current directory
✓ Ready to paste into Claude Code

Requires mcpizy CLI v1.1+ — install via npm i -g mcpizy.

Quick install (MCPs only)

45 min setup, instant code intelligence queries
$ mcpizy install neo4j && mcpizy install github

More Data recipes

🔥🟢

Web Scraping to Database

Schedule a Firecrawl scrape of any website and store the structured results directly in a Supabase table for analysis.

🔍🟢

Search Results Indexing

Run Tavily searches on scheduled topics and index the results in Supabase for trend analysis and content research.

🔴🟢

Cache Invalidation Pipeline

When a Supabase row changes, the corresponding Redis cache key is automatically invalidated to keep your API fresh.

🦆☁️

Data Lake Queries

Query Parquet files directly from S3 using DuckDB without any ETL. Results are returned in seconds for ad-hoc analytics.

Frequently asked questions

What is this workflow?

Knowledge Graph from Code is a data automation that uses Neo4j + GitHub together via the Model Context Protocol. Parse your GitHub repos and build a Neo4j knowledge graph of files, functions, imports, and authors for code intelligence.

How long does setup take?

Setup takes around 45 min setup, instant code intelligence queries. You install the required MCP servers with `mcpizy install neo4j && mcpizy install github`, connect your accounts, and the workflow is ready to run.

How much time does this workflow save?

Once running, this workflow saves ~15 hours/week on large refactors, plus de-risking of platform-wide changes. The concrete business value: De-risks major refactors: answers 'what breaks if I change X?' in a query, not 2 days of archaeology; Keeps domain knowledge alive after senior engineers leave — the graph survives the headcount.

Which MCP servers do I need for this?

You need 2 MCP servers: Neo4j (mcpizy install neo4j), GitHub (mcpizy install github). All are installable in one command via the MCPizy CLI and configured in your `.claude.json` or `.cursor/mcp.json`.

Does this work with Claude Code, Cursor, and Windsurf?

Yes. The workflow runs with any MCP-compatible AI agent — Claude Code, Claude Desktop, Cursor, Windsurf, VS Code with Copilot, and custom agents built on the MCP SDK. The MCP servers are identical across clients; only the config file path (`.claude.json` vs `.cursor/mcp.json`) changes.

Start building this workflow

Install the required MCPs from the marketplace and automate this in 45 min setup.

$ mcpizy install neo4j && mcpizy install github

🕸️Install Neo4j🐙Install GitHub

Free to install. Connect your accounts and this workflow runs itself.