Knowledge engineering meets AI governance

50 years of research on extracting human judgment.
Finally built for production.

FieldRules operationalizes the methods that knowledge engineers developed over five decades: contrastive elicitation, anti-habituation, process reward models, and consequence-naming as a guard rail. The reasoning layer isn't aspirational. It's provably structured.

Request early access See the research →
The Recognition Problem

Even experts can't tell when they're confabulating.

Post-hoc rationalization is the default. A domain expert can confidently explain why they made a decision, but that explanation is often constructed after the fact, not the actual reason they made it. Traditional documentation captures confabulation, not reasoning.

FieldRules uses contrastive elicitation: instead of asking “why did you do this?” it asks “what would go wrong if you didn't?” That shift forces the expert to articulate consequences, not intentions. Consequences are harder to confabulate about.

The reasoning isn't in the answer. It's in the discipline required to answer correctly.
Research Foundations

The methods that make extraction reliable.

01
Contrastive Elicitation
Instead of open-ended explanation, force experts to name consequences. “What fails if this rule doesn't exist?” produces reasoning. “Why does this rule exist?” produces storytelling.
02
Anti-Habituation
Experts stop questioning rules they've internalized. FieldRules re-surfaces implicit logic at decision boundaries: when the rule is created, when it's queried, when it's contested. Each surface triggers re-evaluation.
03
Process Reward Models
Not all reasoning is equal. FieldRules scores the BECAUSE field by specificity, causal depth, counterfactual framing, evidence citation, and consequence naming. A reasoning health score, not just a count of rules.
Lightman et al., Let's Verify Step by Step, 2023
04
Consequence Naming
The first field where pure tautology fails. Experts must articulate what state the rule guards against. Vague reasoning shows up immediately. If you can't name what goes wrong, you don't understand why the rule exists.
Rozenblit & Keil, 2002 — The misunderstood limits of folk science: An illusion of explanatory depth
External Evidence

Independent research points at the same gap we're building for.

FieldRules is pre-pilot. We don't yet have our own published efficacy data — and we won't claim we do. What we have is a growing body of recent external work, from researchers with no commercial interest in FieldRules, arguing that the reasoning trace — not more model capability — is the bottleneck. The BECAUSE field is our name for the artifact those researchers are describing.

01
The missing "inner thought monologue"
Andrej Karpathy (former Director of AI at Tesla, founding member of OpenAI) argues that the bottleneck for AI capability isn't more internet data — it's the absence of expert reasoning traces. He estimates internet text is "0.001% cognition." The BECAUSE field is exactly that data type: the reasoning trajectory of a domain expert, captured at the moment of judgment.
Karpathy, No Priors podcast, March 2026
02
The "odorless proofs" problem
Fields Medalist Terence Tao and Tanya Klowden identify a structural gap in AI-generated output: formally correct results that lack the reasoning "penumbra" — the heuristics, motivation, and causal narrative that make work useful, auditable, and generalizable. They propose a blue-team/red-team framework that matches FieldRules's architecture: human-authored governance constraining AI execution.
Klowden & Tao, arXiv 2603.26524, March 2026
03
Scale doesn't solve miscalibration
An 11-model empirical study tested whether LLMs can produce calibrated probabilistic judgments the way human experts do. Three findings: bigger models help but overconfidence persists at the frontier; chain-of-thought reasoning does not reliably improve judgment; LLMs produce point estimates well but calibrated ranges poorly. Conclusion: the answer is easier than the reasoning; the reasoning has to be human-authored to be calibrated.
Bayesian Elicitation with LLMs: Model Size Helps, Extra Reasoning Doesn't Always, 2026

A note on our own data: we'll publish pilot results once we have them. Until then, we're careful to keep the external validation section about the problem space, not about FieldRules's efficacy. If you want to see our measurement methodology — Reasoning Health Score, anti-pattern detection, divergence alarms — we can walk you through it on a call.

Quality Markers

What separates reasoning from rule-filling.

Specificity
Can the rule apply to just one scenario, or does it generalize? Real reasoning is often specific to boundary conditions.
Causal Mechanism
Does the BECAUSE articulate a mechanism, or just an assertion? “Sites with low coordinators fail because they need support” vs. “because it's important.”
Counterfactual Framing
What state does the rule prevent? If you remove the rule, what fails? Reasoning means naming the alternative state, not just the action.
Evidence Citation
Is the reasoning grounded in data, observation, or authority? Even anecdotal evidence is better than pure intuition. Experts cite it when they have it.
Stakeholder Impact
Does the BECAUSE explain who is affected and how? “Affects customer churn in week 3–5” is stronger than “affects customer success.”
Falsifiability
Could the rule be proven wrong? If it's completely unfalsifiable, it's not reasoning—it's dogma. Real judgment carries risk.
Temporal Context
Does the reasoning account for when it was true? A rule that worked in 2024 may not work now. Experts note drift when they see it.
Exception Awareness
Does the reasoning acknowledge cases where the rule breaks? The strongest judgment is what experts do when a rule fails—and they notice it.
Confidence Calibration
Does the expert express uncertainty where appropriate? “Usually,” “in our experience,” and “with 90% confidence” are signs of honest reasoning, not weakness.
Non-Tautology
Is the BECAUSE a restatement of the THEN, or is it genuinely explanatory? Tautology means the expert is confabulating, not reasoning.

The reasoning is yours.
Make sure it stays structured.

We're onboarding a small number of research partners and production teams. If your domain carries judgment that scales with AI, let's talk.

No deck. No demo-ware. We start with a conversation.