For Researchers — FieldRules

The Recognition Problem

Even experts can't tell when they're confabulating.

Post-hoc rationalization is the default. A domain expert can confidently explain why they made a decision, but that explanation is often constructed after the fact, not the actual reason they made it. Traditional documentation captures confabulation, not reasoning.

FieldRules uses contrastive elicitation: instead of asking “why did you do this?” it asks “what would go wrong if you didn't?” That shift forces the expert to articulate consequences, not intentions. Consequences are harder to confabulate about.

The reasoning isn't in the answer. It's in the discipline required to answer correctly.

Research Foundations

The methods that make extraction reliable.

Contrastive Elicitation

Instead of open-ended explanation, force experts to name consequences. “What fails if this rule doesn't exist?” produces reasoning. “Why does this rule exist?” produces storytelling.

Anti-Habituation

Experts stop questioning rules they've internalized. FieldRules re-surfaces implicit logic at decision boundaries: when the rule is created, when it's queried, when it's contested. Each surface triggers re-evaluation.

Process Reward Models

Not all reasoning is equal. FieldRules scores the BECAUSE field by specificity, causal depth, counterfactual framing, evidence citation, and consequence naming. A reasoning health score, not just a count of rules.

Lightman et al., Let's Verify Step by Step, 2023

Consequence Naming

The first field where pure tautology fails. Experts must articulate what state the rule guards against. Vague reasoning shows up immediately. If you can't name what goes wrong, you don't understand why the rule exists.

Rozenblit & Keil, 2002 — The misunderstood limits of folk science: An illusion of explanatory depth

External Evidence

Independent research points at the same gap we're building for.

FieldRules is pre-pilot. We don't yet have our own published efficacy data — and we won't claim we do. What we have is a growing body of recent external work, from researchers with no commercial interest in FieldRules, arguing that the reasoning trace — not more model capability — is the bottleneck. The BECAUSE field is our name for the artifact those researchers are describing.

The missing "inner thought monologue"

Andrej Karpathy (former Director of AI at Tesla, founding member of OpenAI) argues that the bottleneck for AI capability isn't more internet data — it's the absence of expert reasoning traces. He estimates internet text is "0.001% cognition." The BECAUSE field is exactly that data type: the reasoning trajectory of a domain expert, captured at the moment of judgment.

Karpathy, No Priors podcast, March 2026

The "odorless proofs" problem

Fields Medalist Terence Tao and Tanya Klowden identify a structural gap in AI-generated output: formally correct results that lack the reasoning "penumbra" — the heuristics, motivation, and causal narrative that make work useful, auditable, and generalizable. They propose a blue-team/red-team framework that matches FieldRules's architecture: human-authored governance constraining AI execution.

Klowden & Tao, arXiv 2603.26524, March 2026

Scale doesn't solve miscalibration

An 11-model empirical study tested whether LLMs can produce calibrated probabilistic judgments the way human experts do. Three findings: bigger models help but overconfidence persists at the frontier; chain-of-thought reasoning does not reliably improve judgment; LLMs produce point estimates well but calibrated ranges poorly. Conclusion: the answer is easier than the reasoning; the reasoning has to be human-authored to be calibrated.

Bayesian Elicitation with LLMs: Model Size Helps, Extra Reasoning Doesn't Always, 2026

A note on our own data: we'll publish pilot results once we have them. Until then, we're careful to keep the external validation section about the problem space, not about FieldRules's efficacy. If you want to see our measurement methodology — Reasoning Health Score, anti-pattern detection, divergence alarms — we can walk you through it on a call.

Quality Markers

What separates reasoning from rule-filling.

Specificity

Can the rule apply to just one scenario, or does it generalize? Real reasoning is often specific to boundary conditions.

Causal Mechanism

Does the BECAUSE articulate a mechanism, or just an assertion? “Sites with low coordinators fail because they need support” vs. “because it's important.”

Counterfactual Framing

What state does the rule prevent? If you remove the rule, what fails? Reasoning means naming the alternative state, not just the action.

Evidence Citation

Is the reasoning grounded in data, observation, or authority? Even anecdotal evidence is better than pure intuition. Experts cite it when they have it.

Stakeholder Impact

Does the BECAUSE explain who is affected and how? “Affects customer churn in week 3–5” is stronger than “affects customer success.”

Falsifiability

Could the rule be proven wrong? If it's completely unfalsifiable, it's not reasoning—it's dogma. Real judgment carries risk.

Temporal Context

Does the reasoning account for when it was true? A rule that worked in 2024 may not work now. Experts note drift when they see it.

Exception Awareness

Does the reasoning acknowledge cases where the rule breaks? The strongest judgment is what experts do when a rule fails—and they notice it.

Confidence Calibration

Does the expert express uncertainty where appropriate? “Usually,” “in our experience,” and “with 90% confidence” are signs of honest reasoning, not weakness.

Non-Tautology

Is the BECAUSE a restatement of the THEN, or is it genuinely explanatory? Tautology means the expert is confabulating, not reasoning.

50 years of research on extracting human judgment.
Finally built for production.

Even experts can't tell when they're confabulating.

The methods that make extraction reliable.

Independent research points at the same gap we're building for.

What separates reasoning from rule-filling.

The reasoning is yours.
Make sure it stays structured.

50 years of research on extracting human judgment. Finally built for production.

Even experts can't tell when they're confabulating.

The methods that make extraction reliable.

Independent research points at the same gap we're building for.

What separates reasoning from rule-filling.

The reasoning is yours.Make sure it stays structured.

50 years of research on extracting human judgment.
Finally built for production.

The reasoning is yours.
Make sure it stays structured.