RegulationFor CFO

Guide Labs Open-Sources LLM That Traces Every Output Back to Training Data

Guide Labs open-sources 8B parameter model with built-in traceability to training data

The Ledger Signal | Brief
Verified
0
1
Guide Labs Open-Sources LLM That Traces Every Output Back to Training Data

Why This Matters

Why this matters: Finance leaders can now audit AI decision-making with native interpretability, addressing governance and explainability requirements that have blocked enterprise AI adoption.

Guide Labs Open-Sources LLM That Traces Every Output Back to Training Data

Guide Labs, a San Francisco startup, released an open-source large language model on Monday designed to solve one of artificial intelligence's thorniest problems: understanding why AI systems produce the outputs they do.

The company's 8 billion parameter model, Steerling-8B, uses a novel architecture that allows every token it generates to be traced directly back to its origins in the training data. For finance leaders wrestling with AI governance and explainability requirements, the technology represents a fundamentally different approach to the black-box problem that has plagued enterprise AI adoption.

"If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I've encoded, and then you have to be able to reliably turn that on, turn them off," CEO Julius Adebayo told TechCrunch. "You can do it with current models, but it's very fragile."

The challenge Adebayo describes is familiar to any CFO who has tried to audit an AI system's decision-making. Whether it's xAI's repeated attempts to fine-tune Grok's political leanings or ChatGPT's tendency toward sycophancy, plumbing through neural networks with billions of parameters to understand specific behaviors has proven extraordinarily difficult. Current interpretability methods, according to Adebayo's research, often aren't reliable.

Guide Labs' approach, developed by Adebayo and chief science officer Aya Abdelsalam Ismail, flips the traditional model. Instead of trying to reverse-engineer a trained model's decision-making—what Adebayo calls "neuroscience on a model"—the company engineers interpretability into the architecture from the ground up. Developers insert what they call a "concept layer" that buckets data into traceable categories during training.

The method requires more upfront data annotation, but Guide Labs used other AI models to assist with the labeling process, making Steerling-8B their largest proof of concept to date. The result is a model where tracing can be as simple as identifying reference materials for cited facts or as complex as mapping how the model encodes abstract concepts like humor or gender.

Adebayo's work on this problem began during his PhD at MIT, where he co-authored a widely cited 2020 paper demonstrating that existing methods for understanding deep learning models were unreliable. That research ultimately led to the development of the new training architecture Guide Labs is now commercializing.

The open-source release positions Guide Labs in a growing field of AI interpretability startups, but with a distinct technical approach. Rather than building tools to analyze existing models, the company is attempting to make interpretability a native feature of the model itself—what Adebayo describes as "one of the holy grail questions" in AI development.

For finance organizations, the implications are straightforward: an AI system that can explain its reasoning at the token level could satisfy regulatory requirements and internal controls that current black-box models struggle to meet. The question is whether the additional training complexity and computational overhead prove worthwhile in production environments where speed and cost matter as much as explainability.

Originally Reported By
TechCrunch

TechCrunch

techcrunch.com

Why We Covered This

Finance teams require explainable AI systems for regulatory compliance and audit trails; this model's native traceability to training data directly addresses the black-box problem that has prevented CFO adoption of AI for financial decision-making.

Key Takeaways
If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I've encoded, and then you have to be able to reliably turn that on, turn them off.
Instead of trying to reverse-engineer a trained model's decision-making—what Adebayo calls 'neuroscience on a model'—the company engineers interpretability into the architecture from the ground up.
Rather than building tools to analyze existing models, the company is attempting to make interpretability a native feature of the model itself—what Adebayo describes as 'one of the holy grail questions' in AI development.
CompaniesGuide LabsxAITechCrunch
PeopleJulius Adebayo- CEOAya Abdelsalam Ismail- Chief Science Officer
Key DatesRelease Date:2026-02-23
Affected Workflows
AuditReporting
S
WRITTEN BY

Sam Adler

Finance and technology correspondent covering the intersection of AI and corporate finance.

Responses (0 )