Guide Labs Open-Sources LLM That Traces Every Output Back to Training Data
Guide Labs, a San Francisco startup, released an open-source large language model on Monday designed to solve one of artificial intelligence's thorniest problems: understanding why AI systems produce the outputs they do.
The company's 8 billion parameter model, Steerling-8B, uses a novel architecture that allows every token it generates to be traced directly back to its origins in the training data. For finance leaders wrestling with AI governance and explainability requirements, the technology represents a fundamentally different approach to the black-box problem that has plagued enterprise AI adoption.
"If I have a trillion ways to encode gender, and I encode it in 1 billion of the 1 trillion things that I have, you have to make sure you find all those 1 billion things that I've encoded, and then you have to be able to reliably turn that on, turn them off," CEO Julius Adebayo told TechCrunch. "You can do it with current models, but it's very fragile."
The challenge Adebayo describes is familiar to any CFO who has tried to audit an AI system's decision-making. Whether it's xAI's repeated attempts to fine-tune Grok's political leanings or ChatGPT's tendency toward sycophancy, plumbing through neural networks with billions of parameters to understand specific behaviors has proven extraordinarily difficult. Current interpretability methods, according to Adebayo's research, often aren't reliable.
Guide Labs' approach, developed by Adebayo and chief science officer Aya Abdelsalam Ismail, flips the traditional model. Instead of trying to reverse-engineer a trained model's decision-making—what Adebayo calls "neuroscience on a model"—the company engineers interpretability into the architecture from the ground up. Developers insert what they call a "concept layer" that buckets data into traceable categories during training.
The method requires more upfront data annotation, but Guide Labs used other AI models to assist with the labeling process, making Steerling-8B their largest proof of concept to date. The result is a model where tracing can be as simple as identifying reference materials for cited facts or as complex as mapping how the model encodes abstract concepts like humor or gender.
Adebayo's work on this problem began during his PhD at MIT, where he co-authored a widely cited 2020 paper demonstrating that existing methods for understanding deep learning models were unreliable. That research ultimately led to the development of the new training architecture Guide Labs is now commercializing.
The open-source release positions Guide Labs in a growing field of AI interpretability startups, but with a distinct technical approach. Rather than building tools to analyze existing models, the company is attempting to make interpretability a native feature of the model itself—what Adebayo describes as "one of the holy grail questions" in AI development.
For finance organizations, the implications are straightforward: an AI system that can explain its reasoning at the token level could satisfy regulatory requirements and internal controls that current black-box models struggle to meet. The question is whether the additional training complexity and computational overhead prove worthwhile in production environments where speed and cost matter as much as explainability.


















Responses (0 )