I’m building a “graph builder” script for my Markdown-based digital garden. Another AI helped me design the architecture in pieces; now I want you to produce ONE complete, working Python script from this spec.
Please read everything below (requirements + examples + partial code), then:
Scan all .md files in the project, build a node registry and a graph of connections between them (edges), then produce a Markdown report with:
type,The graph must incorporate both:
[id](#) and [label](#)author)All relationships should be represented as edges with a type.
This script DOES:
.md files in the repo recursively.id, aliases, type, and metadata.[...](#) wikilinks from Markdown body.id and aliases.reports/graph-report.md).This script DOES NOT:
.md files.Think of it as: index + resolver + validator.
Each Markdown file has frontmatter like:
---
id: the-odyssey
aliases:
- odyssey
- a-odisseia
type: book
metadata:
title: The Odyssey
author: homer
year: -800
---
Another example:
---
id: caderno-do-fim-do-mundo--cleyton-cabral
aliases: []
type: book
metadata:
title: Caderno do Fim do Mundo
author: cleyton-cabral
year: 2025
---
Later I’ll also have nodes of type author, concept, note, etc., but for now just handle whatever type appears (including missing).
Use pathlib and recursive glob from project root:
from pathlib import Path
ROOT = Path(".")
markdown_files = list(ROOT.rglob("*.md"))
The script should treat the whole tree as the “garden” (not just /books).
For each .md file:
--- lines.yaml.safe_load.id (required for a valid node; if missing, treat as a node without id and report it),aliases (optional; can be absent, a string, or list),type (optional; we will flag missing type in the report),metadata (dict; we’ll use some of its fields to create edges).Example in-memory representation (you can model this as a simple dict or a small Node dataclass):
nodes = {
"the-odyssey": {
"id": "the-odyssey",
"file": Path("books/finished/the-odyssey.md"),
"aliases": ["odyssey", "a-odisseia"],
"type": "book",
"metadata": {
"title": "The Odyssey",
"author": "homer",
"year": -800,
},
},
# ...
}
You need two maps:
id_map = {
"the-odyssey": nodes["the-odyssey"], # or a Node instance
# ...
}
alias_map = {
"odyssey": "the-odyssey",
"a-odisseia": "the-odyssey",
# ...
}
Resolution function:
def resolve(name: str) -> str | None:
"""
Given a link target like 'a-odisseia' or 'the-odyssey',
return the canonical node id, or None if not found.
"""
if name in id_map:
return name
if name in alias_map:
return alias_map[name]
return None
ID and alias collisions should be detected and reported (e.g. two files claim the same id, or the same alias points to two different ids).
For each file’s body (Markdown content after frontmatter):
[something](#)[Label to display](#)Each found link should be recorded as a raw edge candidate:
{
"source": "the-bell-jar", # source node id
"raw": "a-odisseia", # raw target name before resolution
"label": "A Odisseia", # optional, None if not present
"kind": "inline", # distinguish from metadata edges
}
You may assume:
the-odyssey, cleyton-cabral).Use a regex to find [...](#), then split on | if present.
Some relationships come from YAML metadata, not inline links. Start with author and make it easy to extend later.
Frontmatter example:
metadata:
author: cleyton-cabral
# or
authors:
- cleyton-cabral
- another-author
Design a mapping layer so you don’t hardcode “author” everywhere:
RELATION_FIELDS = {
"author": "author", # field name -> edge type
"authors": "author",
# later:
# "translator": "translator",
# "inspired_by": "inspired-by",
}
Normalize metadata values to a list:
def ensure_list(value):
if isinstance(value, list):
return value
return [value]
Then, for each node:
def extract_metadata_edges(node_id: str, data: dict) -> list[dict]:
"""
Given a node's frontmatter data, extract edges defined by metadata fields
like 'author', 'authors', etc.
"""
edges: list[dict] = []
metadata = data.get("metadata", {}) or {}
for field, relation_type in RELATION_FIELDS.items():
if field not in metadata:
continue
values = ensure_list(metadata[field])
for v in values:
edges.append({
"from": node_id,
"to_raw": v, # raw target id/alias
"type": relation_type, # e.g. 'author'
"source": "metadata",
})
return edges
We will resolve to_raw via resolve() in the next step.
Combine both sources:
extract_metadata_edges)Then resolve:
all_edges: list[dict] = []
for edge in raw_edges:
target_id = resolve(edge["to_raw"] or edge["raw"])
if target_id is None:
# unresolved / missing node
edge_record = {
"from": edge["from"] or edge["source"],
"raw": edge["to_raw"] or edge["raw"],
"type": edge.get("type", "link"),
"resolved": False,
}
all_edges.append(edge_record)
else:
edge_record = {
"from": edge["from"] or edge["source"],
"to": target_id,
"type": edge.get("type", "link"),
"resolved": True,
}
all_edges.append(edge_record)
(Feel free to design a cleaner internal structure, but keep the idea: edges include from, either to or raw, and type.)
From all_edges:
Backlinks index (incoming edges per node):
backlinks: dict[str, list[dict]] = {
# node_id -> list of {from, type, source_file}
}
Missing nodes (links pointing to non-existent nodes):
Aggregate by raw name and keep a list of where they were referenced:
missing = {
"certain-thing": [
{"from": "book-a", "file": "books/book-a.md", "type": "link"},
{"from": "note-b", "file": "notes/note-b.md", "type": "author"},
],
# ...
}
Orphans:
A node is an orphan if it has no incoming and no outgoing edges:
orphans = [node_id for node_id in nodes if node_id not in backlinks and node_id not in outgoing_map]
You can also optionally distinguish:
Nodes without type:
List any node that has no type field in its frontmatter:
nodes_without_type = [node_id for node_id, node in nodes.items() if not node.get("type")]
Basic summary stats:
Write a Markdown file (e.g. reports/graph-report.md). Create the reports/ directory if needed.
Structure should look roughly like:
```markdown
These links point to nodes that do not exist yet.
Use the node id when rendering [id](#). If you want, you can also include type in parentheses next to each orphan or missing node.
Please:
ids (two files defining the same id),[same-id](#) inside its own file) as either:
pathlib.Path for filesystem paths.yaml.safe_load for YAML.load_nodes()build_registries(nodes)parse_links_from_file(...)extract_metadata_edges(...)build_edges(...)compute_backlinks_and_orphans(...)write_report(...)main()type).Please keep all these comments/instructions in the script so that future-me can open this file and understand the pipeline without re-reading this prompt.
Finally, return only the Python code (no extra prose around it).