Don’t Ask AI for Stock Advice — It Gets Financial Data Wrong Almost Every Time

Contents

A New Strategic Problem for Publicly Listed Companies

The Takeaway: Configuration Matters More Than Model Choice

If you’ve been asking ChatGPT or Claude to summarize a company’s quarterly results, there’s a good chance the numbers you received were wrong. A study testing the major AI models on CAC 40 financial data found that when operating on internal knowledge alone, these systems deliver near-zero accuracy — with ChatGPT returning not a single correct figure across 39 companies. The findings are a wake-up call for investors, finance professionals, and the companies whose information AI systems routinely misrepresent.

The research, conducted by Charles-Henri d’Auvigny at Réputation Age, placed AI models in the position of a retail investor trying to quickly retrieve key financial indicators: revenue, net income, operating income, and net debt. Three increasingly sophisticated query methods were tested across ChatGPT (OpenAI), Gemini (Google), and Claude (Anthropic), producing results that vary dramatically depending on how the AI is used.

The Three-Tier Accuracy Gap

The most basic approach — asking the AI a direct, single question requesting a full results table — mirrors how most casual users interact with these tools. The outcome is nearly worthless for financial purposes. ChatGPT produced zero correct answers. Claude reached 7% accuracy. Gemini performed best of the three at 13%, still a deeply unreliable rate for any investment decision. The AI models are drawing on static, internal training data with no real-time verification, no access to primary sources, and a structural tendency to favor US-language publications over French corporate disclosures.

A second method — using an automated agent to break the request into individual company-by-company, indicator-by-indicator questions — improved the structural quality of responses slightly, but not the underlying accuracy. Without source verification, no model exceeded 15% correctness. More granular questioning doesn’t compensate for wrong or outdated training data.

The third method is where the picture changes entirely. When the AI agent is configured to retrieve data directly from official company publications, extract the relevant figures, and assemble them rather than recall them, Claude reaches up to 80% accuracy. The difference isn’t model intelligence — it’s source access. Verified inputs produce reliable outputs; memory-based retrieval does not.

💡 Key Insight

The accuracy gap between a plain AI query and a properly configured retrieval agent is the difference between 0–13% and 80%. For financial data, this isn’t a minor performance variation — it’s the difference between useful analysis and dangerous misinformation.

Why AI Models Get Financial Data Wrong

Two structural issues compound the problem. First, AI models are trained on datasets that skew heavily toward English-language and US-market content, which means French corporate disclosures — even from major CAC 40 companies — are systematically underrepresented. Second, and more critically, these models don’t treat company investor relations pages as primary, authoritative sources. They may retrieve outdated annual reports, extrapolate from older results, or blend data from multiple fiscal periods without flagging the inconsistency.

→ What this means for users

Any financial figure produced by a standard AI query — without source retrieval — should be independently verified before acting on it. The confidence of the AI’s tone has no relationship to the accuracy of its numbers.

A New Strategic Problem for Publicly Listed Companies

Beyond the investor-facing risk, the study identifies a structural challenge for companies themselves. AI models are increasingly becoming the first point of contact between public companies and information-seekers — replacing, for a growing share of users, both search engines and direct financial media. Being accurately represented in an AI-generated summary is becoming as strategically important as ranking well on Google.

The problem is that current AI systems don’t reliably source from official company channels. Publishing accurate information is no longer sufficient if that information isn’t structured or positioned in a way that AI retrieval systems can find, parse, and prioritize. The study’s authors introduce the concept of Generative Reputation Optimization (GRO) — a discipline specifically concerned with making corporate information accessible and correctly weighted by AI models, analogous to what SEO became for search engines.

Query Method	How It Works	Best Accuracy Observed
Direct interrogation	Single question, internal knowledge only	13% (Gemini) / 0% (ChatGPT)
Sequential agent queries	Company-by-company, no source access	<15% across all models
Retrieval agent	Fetches data directly from official publications	Up to 80% (Claude)

The Takeaway: Configuration Matters More Than Model Choice

The study’s most important finding isn’t about which AI model is best. It’s about what architecture you use to ask the question. A well-configured Claude with retrieval access substantially outperforms any model operating on memory alone — including itself. For finance professionals, this means the practical question isn’t “which AI should I trust?” but rather “is my AI actually reading the source documents, or making things up from training data?”

GRO may still be an emerging concept, but the underlying pressure it addresses is real and growing. As AI-mediated information access accelerates, companies that don’t actively manage how their data is represented in AI outputs risk being invisible, misquoted, or replaced in search results by aggregators that are better indexed for machine retrieval — regardless of whether their underlying data is more accurate.

💡 The Bigger Picture

AI models are already acting as financial information intermediaries for a growing population of retail investors. The gap between what they confidently say and what is actually true — when not connected to live sources — is not a minor UX issue. It is a systemic accuracy problem that neither users nor companies have fully reckoned with.