Corpus in. Canonical data out.

CanonicAI turns unstructured knowledge — books, research papers, domain documents — into canonical, queryable datasets. At scale, with provenance.

“Anyone can write a prompt. The defensible thing is running thousands of multi-step extractions reliably, idempotently, and with lineage — at a cost you control. That’s why factory is the honest metaphor: a production line with QA and inventory control, not a clever prompt.”

— THE OPERATING THESIS

The production lines

Book FactoryWhole books deconstructed at chapter-respecting fidelity into tagged summaries,
          argument models, and factor structures — not naive chunks.
Article FactoryPeer-reviewed research distilled into instruments, constructs, citations, and
          effect data — feeding a living evidence engine.
Compendium FactoryReference catalogs of validated measurement scales extracted item-by-item —
          validated against known ground truth at 95% recall.
Schema AuthorityOne canonical measurement vocabulary — constructs, items, instruments, effect
          sizes — defined once, conformed to by every consumer.

The line, running

8,519

Registered assets

560+

Instruments extracted

120+

Books deconstructed

SHA-256

Provenance, every source

Glass box, not black box

Every dataset CanonicAI ships traces to its source — file hashes, extraction lineage, model and prompt provenance, idempotent re-derivation. If a number is in the output, you can walk it back to the page it came from.

The engine is the producer and source-of-truth; everything downstream is a consumer. It powers the PeopleAnalyst family:

PeopleAnalyst — the destination Principia — the evidence engine People Analytics Toolbox