Clarity

Smart research tools that work where the cloud can't.

Clarity is a ThreatControl research project. We're building knowledge tools that help researchers make sense of complex technical material - datasheets, schematics, code, photos, handwritten notes - without sending any of it off the laptop.

Some of the most important technical research happens in environments where cloud-hosted AI is not an option. Sensitivity rules it out. Connectivity rules it out. Provenance and traceability rule it out. Researchers in those settings still face the same problem: a sprawling pile of multi-modal source material and a finite amount of time to understand it.

The problem we're working on

Researchers analysing complex industrial systems - the components inside a piece of equipment, the protocols between them, the firmware that runs them - have to ingest enormous amounts of disparate material. Manuals, schematics, photographs of circuit boards, datasheets, vendor documentation, code, the researcher's own annotations. Most of the work is finding things, cross-referencing them, and noticing what isn't there.

Off-the-shelf retrieval-augmented generation tools handle text PDFs reasonably well and stop. They don't model how a system fits together. They don't know what they haven't found. They don't tell you which source a claim came from. And they assume an internet connection.

What makes Clarity different

Structural knowledge, not just search - Clarity builds a structural model of the system being investigated: components, interfaces, protocols, firmware, vendors. Queries can follow the structure, not just the keywords.

Gap-aware reasoning - The tool tracks what it expected to find but didn't. "No firmware update history found" or "no application note for this package variant" is often more useful than a confident wrong answer.

Different-family verification - Every model claim is cross-checked by a different-architecture, different-vendor model. Disagreements that can't be auto-resolved escalate to the researcher rather than being silently overwritten.

Provenance for every claim - Every fact cites its source: which document, which page, which extraction method, which confidence tier. Claims that come from model inference rather than ingested material are flagged as such.

Multi-modal ingestion - PDFs, schematics, photographs of boards, code, handwritten margin notes. A local vision model reads what text extraction can't.

Verifiable offline operation - Local LLM, local vector store, local graph store. No external API calls, no telemetry. Containment is layered (no network interface, kernel-level egress block, syscall denylist, application-level refusal) so the offline property is auditable, not just claimed.

Where it applies

The same capability shows up in a surprising number of places once you look:

Defence and government research with sensitive corpora that cannot leave the premises
Regulated industries - medical devices, automotive, aerospace - where component-level diligence is part of the compliance story
Insurance and procurement teams assessing complex equipment, vendors, and supply chains
Hardware tear-down and reverse engineering work in security research
Any environment where offline operation, source traceability, or document sensitivity rules out cloud-hosted AI tools

Why we're building it

ThreatControl already builds tools that ingest disparate technical material, reason about how systems are connected, and produce structured findings with traceable evidence. Clarity extends that work into a setting where everything has to run locally and every claim has to be defensible. An earlier (much smaller-scope) version of Clarity was built for a defence prime at a prior startup - same problem class, narrower scope - so the architectural patterns are well-grounded. We treat sensitivity, provenance, and the limits of model confidence as design constraints rather than afterthoughts.

Status

Clarity is in active research and prototype. The core architecture - multi-modal ingestion, structural knowledge graph, model router with swap-minimising scheduler, cascading verifier, verifiable runtime isolation - is in working-draft form, with runnable sketches of the agent main loop and the egress-monitor sidecar. We're talking to organisations who recognise the problem - particularly teams running technical investigations in offline or sensitivity-constrained environments - and who would like to be involved early. If that sounds like you, get in touch.