I’m building a “graph builder” script for my Markdown-based digital garden. Another AI helped me design the architecture in pieces; now I want you to produce ONE complete, working Python script from this spec.
Please read everything below (requirements + examples + partial code), then:
Scan all .md files in the project, build a node registry and a graph of connections between them (edges), then produce a Markdown report with:
type,The graph must incorporate both:
[id](#) and [label](#)author)All relationships should be represented as edges with a type.
This script DOES:
.md files in the repo recursively.id, aliases, type, and metadata.[...](#) wikilinks from Markdown body.id and aliases.reports/graph-report.md).This script DOES NOT:
.md files.Think of it as: index + resolver + validator.
Each Markdown file has frontmatter like:
---
id: the-odyssey
aliases:
- odyssey
- a-odisseia
type: book
metadata:
title: The Odyssey
author: homer
year: -800
---
Another example:
---
id: caderno-do-fim-do-mundo--cleyton-cabral
aliases: []
type: book
metadata:
title: Caderno do Fim do Mundo
author: cleyton-cabral
year: 2025
---
Later I’ll also have nodes of type author, concept, note, etc., but for now just handle whatever type appears (including missing).
Use pathlib and recursive glob from project root:
from pathlib import Path
ROOT = Path(".")
markdown_files = list(ROOT.rglob("*.md"))
The script should treat the whole tree as the “garden” (not just /books).
For each .md file:
--- lines.yaml.safe_load.id (required for a valid node; if missing, treat as a node without id and report it),aliases (optional; can be absent, a string, or list),type (optional; we will flag missing type in the report),metadata (dict; we’ll use some of its fields to create edges).Example in-memory representation (you can model this as a simple dict or a small Node dataclass):
nodes = {
"the-odyssey": {
"id": "the-odyssey",
"file": Path("books/finished/the-odyssey.md"),
"aliases": ["odyssey", "a-odisseia"],
"type": "book",
"metadata": {
"title": "The Odyssey",
"author": "homer",
"year": -800,
},
},
# ...
}
You need two maps:
id_map = {
"the-odyssey": nodes["the-odyssey"], # or a Node instance
# ...
}
alias_map = {
"odyssey": "the-odyssey",
"a-odisseia": "the-odyssey",
# ...
}
Resolution function:
def resolve(name: str) -> str | None:
"""
Given a link target like 'a-odisseia' or 'the-odyssey',
return the canonical node id, or None if not found.
"""
if name in id_map:
return name
if name in alias_map:
return alias_map[name]
return None
ID and alias collisions should be detected and reported (e.g. two files claim the same id, or the same alias points to two different ids).
[...](#))For each file’s body (Markdown content after frontmatter):
[something](#)[Label to display](#)Each found link should be recorded as a raw edge candidate:
{
"source": "the-bell-jar", # source node id
"raw": "a-odisseia", # raw target name before resolution
"label": "A Odisseia", # optional, None if not present
"kind": "inline", # distinguish from metadata edges
}
You may assume:
the-odyssey, cleyton-cabral).Use a regex to find [...](#), then split on | if present.
author)Some relationships come from YAML metadata, not inline links. Start with author and make it easy to extend later.
Frontmatter example:
metadata:
author: cleyton-cabral
# or
authors:
- cleyton-cabral
- another-author
Design a mapping layer so you don’t hardcode “author” everywhere:
RELATION_FIELDS = {
"author": "author", # field name -> edge type
"authors": "author",
# later:
# "translator": "translator",
# "inspired_by": "inspired-by",
}
Normalize metadata values to a list:
def ensure_list(value):
if isinstance(value, list):
return value
return [value]
Then, for each node:
def extract_metadata_edges(node_id: str, data: dict) -> list[dict]:
"""
Given a node's frontmatter data, extract edges defined by metadata fields
like 'author', 'authors', etc.
"""
edges: list[dict] = []
metadata = data.get("metadata", {}) or {}
for field, relation_type in RELATION_FIELDS.items():
if field not in metadata:
continue
values = ensure_list(metadata[field])
for v in values:
edges.append({
"from": node_id,
"to_raw": v, # raw target id/alias
"type": relation_type, # e.g. 'author'
"source": "metadata",
})
return edges
We will resolve to_raw via resolve() in the next step.
Combine both sources:
extract_metadata_edges)Then resolve:
all_edges: list[dict] = []
for edge in raw_edges:
target_id = resolve(edge["to_raw"] or edge["raw"])
if target_id is None:
# unresolved / missing node
edge_record = {
"from": edge["from"] or edge["source"],
"raw": edge["to_raw"] or edge["raw"],
"type": edge.get("type", "link"),
"resolved": False,
}
all_edges.append(edge_record)
else:
edge_record = {
"from": edge["from"] or edge["source"],
"to": target_id,
"type": edge.get("type", "link"),
"resolved": True,
}
all_edges.append(edge_record)
(Feel free to design a cleaner internal structure, but keep the idea: edges include from, either to or raw, and type.)
From all_edges:
Backlinks index (incoming edges per node):
backlinks: dict[str, list[dict]] = {
# node_id -> list of {from, type, source_file}
}
Missing nodes (links pointing to non-existent nodes):
Aggregate by raw name and keep a list of where they were referenced:
missing = {
"certain-thing": [
{"from": "book-a", "file": "books/book-a.md", "type": "link"},
{"from": "note-b", "file": "notes/note-b.md", "type": "author"},
],
# ...
}
Orphans:
A node is an orphan if it has no incoming and no outgoing edges:
orphans = [node_id for node_id in nodes if node_id not in backlinks and node_id not in outgoing_map]
You can also optionally distinguish:
Nodes without type:
List any node that has no type field in its frontmatter:
nodes_without_type = [node_id for node_id, node in nodes.items() if not node.get("type")]
Basic summary stats:
Write a Markdown file (e.g. reports/graph-report.md). Create the reports/ directory if needed.
Structure should look roughly like:
# Graph Report
## Missing Nodes
These links point to nodes that do not exist yet.
- [certain-thing](#)
- referenced in:
- books/book-a.md (as: author)
- notes/note-b.md (as: link)
---
## Orphan Nodes
Nodes with no connections.
- [lonely-book](#)
- [random-note](#)
---
## Nodes Without Type
- [identity](#)
- [modernism](#)
---
## Summary
- Total nodes: 312
- Total edges: 1240
- Missing nodes: 23
- Orphans: 17
Use the node id when rendering [id](#). If you want, you can also include type in parentheses next to each orphan or missing node.
Please:
ids (two files defining the same id),[same-id](#) inside its own file) as either:
pathlib.Path for filesystem paths.yaml.safe_load for YAML.load_nodes()build_registries(nodes)parse_links_from_file(...)extract_metadata_edges(...)build_edges(...)compute_backlinks_and_orphans(...)write_report(...)main()type).Please keep all these comments/instructions in the script so that future-me can open this file and understand the pipeline without re-reading this prompt.
Finally, return only the Python code (no extra prose around it).