CFO MovesFor CFO

Google’s Gemini 3.1 Pro Takes Top Spot in AI Benchmark for Professional Tasks

Gemini 3.1 Pro tops APEX-Agents benchmark, raising questions about AI vendor lock-in for finance teams

The Ledger Signal | Analysis

Feb 20, 2026·2 min read

Verified

Why This Matters

Why this matters: Finance leaders face a dilemma as AI capabilities advance faster than implementation timelines, potentially requiring strategy revisions mid-fiscal year.

Google's Gemini 3.1 Pro Takes Top Spot in AI Benchmark for Professional Tasks

Google released Gemini 3.1 Pro on Thursday, marking another escalation in the competition among tech giants to build AI models capable of handling complex professional work—the kind that finance departments are increasingly eyeing for automation.

The new model, currently available as a preview ahead of a broader release, represents what Google describes as a significant advancement over Gemini 3, which launched just three months ago in November 2025. For CFOs watching the AI arms race, the release signals how quickly the technology underpinning potential finance automation is evolving—and how difficult it may be to make long-term vendor commitments when capabilities are shifting this rapidly.

According to independent benchmarks shared by Google on Thursday, Gemini 3.1 Pro outperformed its predecessor on tests including one called Humanity's Last Exam. More notably for finance leaders, the model topped APEX-Agents, a benchmarking system designed specifically to measure how well AI models handle real professional tasks rather than academic exercises.

"Gemini 3.1 Pro is now at the top of the APEX-Agents leaderboard," said Brendan Foody, CEO of AI startup Mercor, which developed the benchmark. Foody added that the results demonstrate "how quickly agents are improving at real knowledge work"—the category that includes financial analysis, forecasting, and reporting tasks that finance departments currently staff with analysts.

The release comes as major AI providers including OpenAI and Anthropic have recently launched competing models, all targeting what the industry calls "agentic work"—AI systems that can complete multi-step reasoning tasks with minimal human intervention. This is the capability that would theoretically allow an AI to, say, reconcile accounts across multiple systems or build a financial model from scratch, rather than simply answering questions about existing data.

For finance leaders evaluating AI investments, the rapid succession of model releases presents a practical dilemma. Gemini 3 was considered highly capable when it launched in November. Three months later, its successor claims to be substantially better. The pace suggests that any AI implementation strategy built around today's capabilities may need revision before the fiscal year ends.

The preview release of Gemini 3.1 Pro allows organizations to test the model before committing to production deployments, though Google has not specified a timeline for general availability beyond "soon."

Originally Reported By

TechCrunch

techcrunch.com

Key Takeaways

Gemini 3.1 Pro is now at the top of the APEX-Agents leaderboard

how quickly agents are improving at real knowledge work—the category that includes financial analysis, forecasting, and reporting tasks that finance departments currently staff with analysts