Design notes

Companion to the spec. Informative — why the choices were made and what was considered and set aside.

AgenticMD Design Notes

Companion to SPEC.md. SPEC states what conformance requires; this document states why those choices were made and what was considered and set aside. The reasoning here informs but does not constrain conformance.

Why split SPEC and design rationale

Two kinds of content live in spec projects:

  • Normative. Rules an implementation must enforce. What conformance requires.
  • Informative. Reasoning behind the rules. Why the choices were made. What was considered and rejected.

Mixing them in one document makes both harder to read. An implementer needs the rules cleanly; an evaluator needs the reasoning cleanly. SPEC.md is the first; DESIGN_NOTES.md is the second.

Standard pattern in spec projects — see RFCs (“Considerations” sections), OpenAPI (separate FAQ), language designs (rationale docs).


The node-type vocabulary — design rationale

The question

The vocabulary codifies seven types: domain_brief, architecture_note, decision, guardrail, correction, failure, project_rule. Why these seven? Why not six, or eight? On what principle is closedness defended?

The answer in two parts

1. Empirical sufficiency. The seven are the set observed in production use across the reference corpus (~74 topic files, ~1,200 typed nodes). They were not designed top-down; they were extracted from a working corpus during a conformance audit. The claim being made is empirical sufficiency, not exhaustiveness — sufficient to express the agent-state-interaction behaviors documented to date, nothing more.

2. Principled organization. Empirical sufficiency alone is a weak claim — “it works for our one corpus” is not a defense against the critique “you’d find more if you looked at more corpora.” The principled companion is the dimension along which the seven are non-overlapping. The dimension is how each type interacts with the agent’s existing state.

relation to existing agent statetype
gates further loadingdomain_brief
adds to knowledgearchitecture_note
adds to knowledge + locks rationaledecision
subtracts from action space (per task)guardrail
invalidates prior knowledgecorrection
conditionally activated by symptomsfailure
modifies operating posture across tasksproject_rule

Each type occupies a distinct region of this relation-space. The addition gate (SPEC §4.3) requires an eighth type to occupy a region none of the seven occupies.

This converts “is the vocabulary exhaustive?” from a philosophical question (which it cannot answer) into an operational one (which it can): “show a relation along this dimension that none of the seven occupies, AND a corpus where that relation matters.”

Stress-tests of the dimension

The dimension was stress-tested against three failure modes before being adopted into the spec:

Test 1: Does any type sit at two points along the dimension?

decision was the most likely failure — it both adds knowledge and forecloses options (a chosen path implicitly subtracts the unchosen ones). But the primary relation is add+lock; the subtraction is a downstream consequence, not the relation itself. The dimension tolerates primary relations without requiring every secondary effect to map separately. Holds.

Test 2: Are there relations not covered that aren’t example or open_question?

candidate relationverdict
”future obligation / promise”folds into project_rule — obligation to act is a posture committing to action
”passive observation / status”folds into architecture_note — additive knowledge
”alternative / fork”folds into architecture_note + decision composition
”warning / caveat”folds into guardrail (subtractive on action space) or correction (if it invalidates)
“credit / acknowledgment”social rather than functional; doesn’t have a relation-to-agent-state at all

The two named candidates (example, open_question — see watchlist below) are the strongest gaps. No third was found.

Test 3: Could the dimension itself fragment?

Natural worry: “subtracts from action space” (guardrail) and “modifies posture” (project_rule) both look like operations on what-the-agent-may-do. Are they actually one relation?

They are not. guardrail narrows the action space for a specific task; project_rule modifies the agent’s operating mode across all tasks. The first changes a per-task constraint; the second changes the agent’s frame for evaluating any task. Distinct.

The dimension holds under these tests.

Why this dimension and not another

The dimension is principled but not unique. Other valid organizing dimensions exist:

  • Temporal — was-true / is-true / will-be-true / unresolved
  • Epistemic — certain / probabilistic / contested
  • Modal — descriptive / prescriptive / proscriptive

The agent-state-interaction dimension is selected because it matches what AgenticMD is modeling: how an agent should treat content it retrieves. A spec organizing documentation for human historians might legitimately pick the temporal dimension; a spec for adversarial review might pick the epistemic one. The agent-state-interaction dimension is the one that earns its place in a retrieval-and-treatment spec for software agents.

Other dimensions either don’t carve cleanly here (temporal cross-cuts the types — corrections can be about past or present claims) or model something other than agent behavior (epistemic models source reliability, not agent action).

This is a defense by fit-to-purpose, not by uniqueness. SPEC §4.2 states this explicitly.

Chronology of the seven types

The node types did not come from agent improvisation during spec drafting. They were already in production use across the reference corpus when the extraction agent ran the conformance audit. The audit discovered the existing distribution; it did not propose it.

Usage data from the audit:

typecountsharelikely load-bearing?
architecture_note57548%yes — workhorse
domain_brief17214%yes — convention-required
guardrail13111%yes — deliberate authorial choice each time
decision11310%yes
correction928%yes — used purposefully
failure595%mid — distinct shape, narrower applicability
project_rule434%lowest usage AND flagged in review for split — converging evidence to revisit at v1.0

Six of seven pass the load-bearing smell test. project_rule wobbles and is on the v1.0 review list.


Watchlist (candidates for v1.0)

These types are not in the v0.3 vocabulary but have plausible candidacies. The watchlist is timeboxed to v1.0 — indefinite watchlist is not acceptable. If no second-corpus evidence has emerged by the v1.0 timeline, each candidate will be explicitly resolved: either promoted to formal type (if the dimension shows it is distinct) or formally folded into existing types (with documentation of how the existing types should absorb the use case).

example — the strongest candidate

Relation under the dimension: “adaptable substrate the agent transforms” — a thing the agent can copy and mutate.

This is genuinely absent from the seven. architecture_note adds knowledge; decision constrains future change; guardrail subtracts action. None of those is “here is a thing the agent can copy and mutate.”

Distinct retrieval: triggered by “show me,” “give me a template,” “I want to write one like X.” Different from “how does X work.”

Distinct treatment: copy/adapt, not understand/summarize.

Why not in v0.3: currently lives inside architecture_note bodies. The reference corpus doesn’t show the existing convention breaking down. Watchlist for v1.0; if a second corpus shows examples buried in architecture_note bodies hurting retrieval (briefs return notes when the operator wants templates), promotion is justified.

open_question — secondary candidate

Relation under the dimension: “uncertainty to preserve.”

Treatment is genuinely distinct: the agent must not act as if the matter is settled; it must surface the uncertainty and defer to the user.

Why not in v0.3: the reference corpus shows few of these. Teams tend to decide or stay silent. Could be valuable for in-flight or incomplete corpora. Watchlist for v1.0.

Folds — not candidates

These were considered and folded into existing types:

  • specification (testable MUST/SHOULD contracts) — prohibitions → guardrail, choices → decision, descriptions of contracts → architecture_note with MUST language in the body
  • migration_note (version-conditional behavior) — correction invalidates; migration supplements. Distinction is real but v0.x corpora don’t have migration complexity. v2+ if at all.
  • definition (single-term lookup) — glossary topics with per-term architecture_note sections cover this; the granularity is right
  • background (why a domain exists) — folds into domain_brief; same purpose at a different altitude

§6.3 resolution semantics — why deferred

SPEC §6.3 declines to mandate a resolution mechanism in v0.x. The v1.0 obligation is to promote the non-normative defaults to MUST.

The reasoning for deferral:

  • The v0.x draft window exists for the schema to move.
  • Pinning ambiguous semantics before a second implementation exists is how specs end up with deprecation sections.
  • Every spec that hard-coded resolution semantics in early drafts has had to walk them back when a second implementation surfaced reasonable disagreements.

The non-normative defaults in §6.3 give publishers a starting point without committing the spec to those exact defaults. When a second binding emerges (a database backend, a JSON-LD binding, etc.) and stress-tests the semantics, v1.0 will promote what survived to MUST.


The residual weakness

The (purpose, retrieval, treatment) triples defend why each of the seven is real. The dimension defends why the seven are non-overlapping. Neither derives the seven from first principles.

The spec can show “each type has a defensible triple” and “no two types occupy the same relation along the dimension.” It cannot show “every defensible triple corresponds to one of the seven types.”

This asymmetry is the residual weakness. The honest path forward is empirical-sufficiency with the operational addition gate, not philosophical claims of completeness. A credible derivation would require actual cognitive-science work on documentation-agent behavior — outside the scope of a v1.0 spec.

If such a derivation becomes available (from research, from a second corpus exposing real gaps, from a different organizing dimension that turns out to carve more cleanly), the spec is positioned to absorb it without losing what’s already working.


Why we dropped .amd

v0.2 registered .amd as a canonical extension for AgenticMD documents. v0.4 reverses that decision. The reasoning:

The TSX/JSX analogy was wrong

The strongest v0.2-era argument for .amd borrowed the TSX/JSX pattern: same substrate, different intent, dedicated extension. That analogy is wrong. .tsx deserves its own extension because the syntax actually differs — JSX adds non-JavaScript token sequences that a standard JS parser cannot handle. AgenticMD adds no new syntax. The frontmatter is YAML, the body is CommonMark, the section markers {#id node:type} are valid CommonMark heading-attribute syntax. There is nothing for a parser to do differently. A custom extension here was signaling without substance.

The comparable-format pattern

ProjectExtensionSyntax differs from substrate?
OpenAPI.yaml / .jsonNo — content-identified
JSON Schema.jsonNo — content-identified
Pandoc Markdown.mdStricter rules, same syntax
GitHub Flavored Markdown.mdExtended rules, same syntax
MyST.mdStricter profile, same syntax
MDX.mdxYes — embeds JSX
AsciiDoc.adocYes — different markup
reStructuredText.rstYes — different markup

The pattern: dedicated extension when the syntax actually differs; same extension when only the discipline differs. Pandoc Markdown is in the Pandoc category. MDX is in the MDX category. AgenticMD belongs with Pandoc.

The free-tooling argument is decisive

The whole reason markdown is the chosen substrate is universal tool support. Every renderer (GitHub, GitLab, Bitbucket), IDE (VS Code, JetBrains, Vim), code-review tool, syntax highlighter, spell-checker, formatter, prettier plugin, vale linter, and LLM context window already handles .md. None handle .amd without per-tool configuration.

This is the same trust-preservation argument that justified the markdown substrate in SPEC §1.4: a human reviewer should be able to inspect what an agent wrote without an intermediary tool. A custom extension violates that property — the reviewer’s editor now needs configuration to render the file. We chose markdown to keep the substrate free; inventing a new extension surrenders the win for the signaling benefit, which is small.

Discrimination happens at the corpus level

The mixed-content-directory argument for .amd assumed file-level discrimination was required. It isn’t. Two cleaner mechanisms exist:

  • Corpus-level declaration (SPEC §1.5.1): a directory contains an AgenticMD corpus iff it (or some directory reachable from it) holds a file with node_type: corpus_root. Within scope, all .md files follow the discipline. This mirrors how .editorconfig and tsconfig.json discover their scopes.
  • Content-based identification: a file with node_type: topic_root frontmatter plus typed ## markers has a signature that any tool willing to read 50 lines of YAML can identify. The validator already does this.

In neither case is a custom extension load-bearing.

The reference corpus already told us

The strongest single signal: the reference corpus never migrated to .amd in the first place. AI Studio’s docs/book-ai/ directory is and always has been .md. After v0.2 was published, the reference adopter — the one entity with the most reason to follow the spec exactly — quietly chose not to follow it on the extension question.

v0.4 is not reversing a decision adopters made; it is reversing a decision adopters quietly ignored. The spec is catching up to its own reference corpus.

Why this was hard to see in v0.2

v0.2 was still operating under the “AgenticMD is a file format” framing. v0.3 reframed it as “a retrieval model with a markdown binding,” which already implied that the substrate is incidental and the discipline is the contribution. The extension question wasn’t on the table during the v0.3 reframe, so the v0.2 decision persisted through v0.3 unexamined. v0.4 is the catch-up.

The trajectory is convergent: v0.2 → v0.3 sharpened the framing; v0.4 drops the extension that the new framing already obsoleted. Each correction makes the next one visible. This is the v0.x draft window working as intended.


Substrate independence

SPEC framing positions AgenticMD as a retrieval model with a CommonMark + YAML binding. The substrate is chosen because every renderer already supports it; the same retrieval model could be expressed in other bindings:

  • Database table — typed sections as rows, frontmatter as columns, references as foreign keys
  • JSON-LD — typed sections as nodes, references as IRIs
  • Structured columnar store — same idea, different indexing

None of these have been built. The markdown binding is the one v0.x ships. A second binding is on the v1.0+ list partly because it would stress-test substrate-independence and partly because it would give the resolution-semantics question (§6.3) a second voice to disagree with.

The substrate is incidental, not load-bearing. The contribution is the retrieval model.


See also

  • SPEC.md — the normative specification (what conformance requires)
  • CHANGELOG.md — what changed in each version
  • README.md — project orientation