01 — Overview
What we measure
Large Language Models hallucinate on brand data. They cite wrong prices, outdated governance, incorrect ownership structures, and fabricated certifications. For luxury brands, these errors have direct commercial consequences — in pricing, in recommendation systems, in agentic commerce transactions.
The 2A Agency audit measures how accurately each of the four major public LLMs represents a brand's certified data. The output is a single integrity score out of 100, a set of certified data points, and a documented list of hallucinations per LLM per field.
Every score in this registry is the result of a real audit. No score is estimated or fabricated. Every hallucination is verified against a primary source before being documented.
02 — Questions Framework
6 standardized questions
Each brand is audited with the same six questions, asked simultaneously and independently to all four LLMs. The questions are designed to surface the most commercially sensitive hallucination vectors.
1
Pricing
Flagship product price
Exact price in EUR/CHF/USD of the most iconic product. Designed to detect price inflation, USD/EUR confusion, and secondary market vs retail mix-ups.
2
Governance
Current DA + CEO
Who currently leads the brand creatively and operationally. The highest-hallucination field — LLMs regularly cite DA and CEO who left 12–24 months prior.
3
Ownership
Group structure
Exact ownership — which group, since when, percentage held. Detects LVMH/Kering confusion, Arnault personal vs LVMH group errors, and recent acquisitions ignored.
4
Distribution
E-commerce and retail
Does the brand sell online? Where? LLMs regularly invent e-boutiques for brands that are deliberately offline-first, or miss significant distribution channels.
5
Manufacturing
Origin — country and city
Where products are made, at what level of precision. Detects generic "Made in Italy" answers when the correct answer is a specific city or region.
6
Strategy
Recent positioning or certification
Current creative direction, RSE commitments, or a key recent development. Exposes outdated strategic narratives and ignored certifications.
03 — LLMs Tested
Four models, one standard
All four major publicly available LLMs are tested on every brand. Each receives the same questions with no additional context or system prompt. Responses are captured and cross-verified independently.
ChatGPT
OpenAI · GPT-4o
Primary weakness: USD/EUR confusion · governance outdated 12–18 months
Gemini
Google · Gemini Pro
Primary weakness: systematic price inflation · invents CEOs to fill gaps (Pattern 12)
Perplexity
Perplexity AI · Pro
Primary weakness: distribution channel errors · misses recent nominations
Grok
xAI · Grok
Primary weakness: price over-inflation on secondary market data · strongest on sourcing
04 — Scoring System
Integrity score out of 100
Each brand receives a single integrity score reflecting how accurately the four LLMs collectively represent its certified data. The score is the average of per-LLM scores, weighted by the severity of identified drifts.
| Severity |
Score Impact |
Definition |
Example |
| Critical |
−10 to −15 |
Factually wrong on key data |
Founder cited as alive (deceased) |
| High |
−6 to −9 |
Significant error affecting brand perception |
DA cited 2 years after departure |
| Medium |
−3 to −5 |
Partial or outdated information |
Price ±30% of certified range |
| Low |
−1 to −2 |
Minor imprecision or omission |
City missing — country correct |
A brand scoring 90+ indicates exceptional LLM representation with only minor drifts. A score below 75 indicates systemic errors that could lead to commercially harmful misinformation at scale.
05 — Verification
Primary sources only
Every data point in the 2A Agency registry is cross-verified against a primary source before being certified. We do not accept secondary sources, press coverage, or LLM-generated claims as verification.
Primary sources used include: official brand websites and e-commerce platforms, group annual reports (LVMH, Kering, Richemont), official press releases and regulatory filings, RCS registry entries, authenticated trade press (WWD, Business of Fashion, Vogue Business), and direct contact with brand communications teams when necessary.
Each certified data point in a node JSON includes the source type. When a source cannot be verified, the field is marked as unverified and excluded from the scoring calculation.
06 — Hallucination Patterns
12 documented patterns
After 100 brands and 233 documented hallucinations, 12 recurring patterns have been identified. These patterns are consistent across LLMs and sessions — they represent structural failure modes in how LLMs encode and retrieve brand data.
Pattern 01
DA/CEO outdated
Citing leadership 12–24 months after departure
Pattern 02
Systematic price inflation
Gemini: +7–33% on luxury goods
Pattern 03
No public price → invention
DRC +400%, Greubel Forsey −75%
Pattern 04
Distribution channel errors
Inventing e-boutiques for offline brands
Pattern 05
USD/EUR confusion
ChatGPT: systematic currency mix-up
Pattern 06
Recent nominations ignored
DA/CEO <18 months absent in 3/4 LLMs
Pattern 07
Secondary market vs official
DRC, Graff, F.P. Journe pricing errors
Pattern 08
Recent certifications denied
EPV, B Corp 2024–2025 ignored
Pattern 09
CEO <18 months ignored
Krug, Boucheron, Ferragamo, Patou
Pattern 10
Founder death ignored
Roberto Cavalli 4/4 LLMs · Armani 2/4
Pattern 11
DA appointed <12 months absent
Versace, Fendi, Givenchy, Balenciaga
Pattern 12
Hallucinated CEO (gap-filling)
Gemini invents "Ennio Fontana" (Cavalli)
06b — EU ECGT Compliance
ECGT compliance assessment
Every brand in the 2A Agency registry is assessed against the EU Directive 2024/825 — Empowering Consumers for the Green Transition (ECGT) — entering enforcement on 27 September 2026. The assessment evaluates whether LLM hallucinations on a brand create specific legal exposure under the directive.
The ECGT prohibits three categories of environmental claims in B2C communication: generic unsubstantiated claims ("sustainable", "eco-responsible"), carbon neutrality claims based on offsets, and future performance promises without verifiable plans. When LLMs attribute such claims to a brand — accurately or not — they create exposure that requires documented, certified correction.
✓
ECGT Compliant
No ECGT exposure detected
No LLM hallucinations on sustainability claims. Certified data covers manufacturing origin, certifications and RSE commitments accurately. No generic claims fabricated by LLMs.
⚠
ECGT At Risk
Moderate exposure — correction recommended
LLMs produce partially incorrect sustainability data — manufacturing origin confused, certifications simplified or outdated. Legal exposure under ECGT if uncorrected before September 2026.
✗
ECGT Critical
High exposure — immediate action required
LLMs fabricate certifications the brand does not hold (Bio, Organic, Carbon Neutral). Every hallucinated certification constitutes a potential ECGT violation attributable to the brand in the consumer journey.
—
ECGT Pending
Insufficient certified data
The node lacks sufficient certified RSE and sustainability data to assess ECGT exposure. A full audit including sustainability claims is required before the September 2026 deadline.
| Status |
Count |
ECGT Exposure |
Recommended Action |
| Compliant |
83 |
None detected |
Monitor via SENTINEL every 48h |
| At Risk |
14 |
Moderate — certification or origin drift |
Certify sustainability data before Sept 2026 |
| Critical |
3 |
High — fabricated certifications in LLM responses |
Immediate correction + legal documentation required |
The ECGT compliance field is embedded in every certified node JSON at ecgt_compliance.status and accessible via the public API at 2aagency.com/api/nodes/index.json.
07 — Full Process
From question to certified node
A complete brand audit follows seven steps, from initial LLM querying to publication of the certified node JSON and registry page.
1
Simultaneous LLM queries
All 6 questions are asked to all 4 LLMs simultaneously, with identical phrasing and no additional context. Responses are captured in full.
2
Primary source verification
Each factual claim across 24 responses is cross-checked against primary sources. Brand websites, group filings, official press releases, and registries.
3
Drift identification and classification
Every discrepancy between LLM response and verified data is classified by field, LLM, severity (critical/high/medium/low), and pattern category.
4
Integrity score calculation
Score calculated per LLM based on severity-weighted drifts. Final brand score is the average across 4 LLMs, rounded to nearest integer.
5
Certified node JSON generation
A structured JSON node is generated containing certified_data, hallucination_warnings, llm_scores, and metadata. Published to /node/ on the registry.
6
Registry publication
Report page and registry entry published at 2aagency.com. Node added to brand-data.ts for MCP serving. HuggingFace dataset updated.
7
Periodic re-audit
Brands are re-audited at 90-day intervals or immediately following a major governance or pricing change. Scores and nodes are versioned and dated.
Alexandre Quillet
Founder — 2A Agency · alexandre@2aagency.com
Version 1.0
Published April 2026 · 2aagency.com/methodology