Corpus in. Canonical data out.
CanonicAI turns unstructured knowledge — books, research papers, domain documents — into canonical, queryable datasets. At scale, with provenance.
“Anyone can write a prompt. The defensible thing is running thousands of multi-step extractions reliably, idempotently, and with lineage — at a cost you control. That’s why factory is the honest metaphor: a production line with QA and inventory control, not a clever prompt.”
The production lines
Book Factory
Whole books deconstructed at chapter-respecting fidelity into tagged summaries, argument models, and factor structures — not naive chunks.
Article Factory
Peer-reviewed research distilled into instruments, constructs, citations, and effect data — feeding a living evidence engine.
Compendium Factory
Reference catalogs of validated measurement scales extracted item-by-item — validated against known ground truth at 95% recall.
Schema Authority
One canonical measurement vocabulary — constructs, items, instruments, effect sizes — defined once, conformed to by every consumer.
The line, running
Glass box, not black box
Every dataset CanonicAI ships traces to its source — file hashes, extraction lineage, model and prompt provenance, idempotent re-derivation. If a number is in the output, you can walk it back to the page it came from.
The engine is the producer and source-of-truth; everything downstream is a consumer. It powers the PeopleAnalyst family: