Methodology — 2A Agency

01 — Overview

What we measure

Large Language Models hallucinate on brand data. They cite wrong prices, outdated governance, incorrect ownership structures, and fabricated certifications. For luxury brands, these errors have direct commercial consequences — in pricing, in recommendation systems, in agentic commerce transactions.

The 2A Agency audit measures how accurately each of the four major public LLMs represents a brand's certified data. The output is a single integrity score out of 100, a set of certified data points, and a documented list of hallucinations per LLM per field.

Every score in this registry is the result of a real audit. No score is estimated or fabricated. Every hallucination is verified against a primary source before being documented.

02 — Questions Framework

6 standardized questions

Each brand is audited with the same six questions, asked simultaneously and independently to all four LLMs. The questions are designed to surface the most commercially sensitive hallucination vectors.

1

Pricing

Flagship product price

Exact price in EUR/CHF/USD of the most iconic product. Designed to detect price inflation, USD/EUR confusion, and secondary market vs retail mix-ups.

2

Governance

Current DA + CEO

Who currently leads the brand creatively and operationally. The highest-hallucination field — LLMs regularly cite DA and CEO who left 12–24 months prior.

3

Ownership

Group structure

Exact ownership — which group, since when, percentage held. Detects LVMH/Kering confusion, Arnault personal vs LVMH group errors, and recent acquisitions ignored.

4

Distribution

E-commerce and retail

Does the brand sell online? Where? LLMs regularly invent e-boutiques for brands that are deliberately offline-first, or miss significant distribution channels.

5

Manufacturing

Origin — country and city

Where products are made, at what level of precision. Detects generic "Made in Italy" answers when the correct answer is a specific city or region.

6

Strategy

Recent positioning or certification

Current creative direction, RSE commitments, or a key recent development. Exposes outdated strategic narratives and ignored certifications.

03 — LLMs Tested

Four models, one standard

All four major publicly available LLMs are tested on every brand. Each receives the same questions with no additional context or system prompt. Responses are captured and cross-verified independently.

ChatGPT

OpenAI · GPT-4o

Primary weakness: USD/EUR confusion · governance outdated 12–18 months

Gemini

Google · Gemini Pro

Primary weakness: systematic price inflation · invents CEOs to fill gaps (Pattern 12)

Perplexity

Perplexity AI · Pro

Primary weakness: distribution channel errors · misses recent nominations

Grok

xAI · Grok

Primary weakness: price over-inflation on secondary market data · strongest on sourcing

04 — Scoring System

Integrity score out of 100

Each brand receives a single integrity score reflecting how accurately the four LLMs collectively represent its certified data. The score is the average of per-LLM scores, weighted by the severity of identified drifts.

Severity	Score Impact	Definition	Example
Critical	−10 to −15	Factually wrong on key data	Founder cited as alive (deceased)
High	−6 to −9	Significant error affecting brand perception	DA cited 2 years after departure
Medium	−3 to −5	Partial or outdated information	Price ±30% of certified range
Low	−1 to −2	Minor imprecision or omission	City missing — country correct

A brand scoring 90+ indicates exceptional LLM representation with only minor drifts. A score below 75 indicates systemic errors that could lead to commercially harmful misinformation at scale.

05 — Verification

Primary sources only

Every data point in the 2A Agency registry is cross-verified against a primary source before being certified. We do not accept secondary sources, press coverage, or LLM-generated claims as verification.

Primary sources used include: official brand websites and e-commerce platforms, group annual reports (LVMH, Kering, Richemont), official press releases and regulatory filings, RCS registry entries, authenticated trade press (WWD, Business of Fashion, Vogue Business), and direct contact with brand communications teams when necessary.

Each certified data point in a node JSON includes the source type. When a source cannot be verified, the field is marked as unverified and excluded from the scoring calculation.

06 — Hallucination Patterns

12 documented patterns

After 100 brands and 233 documented hallucinations, 12 recurring patterns have been identified. These patterns are consistent across LLMs and sessions — they represent structural failure modes in how LLMs encode and retrieve brand data.

Pattern 01

DA/CEO outdated

Citing leadership 12–24 months after departure

Pattern 02

Systematic price inflation

Gemini: +7–33% on luxury goods

Pattern 03

No public price → invention

DRC +400%, Greubel Forsey −75%

Pattern 04

Distribution channel errors

Inventing e-boutiques for offline brands

Pattern 05

USD/EUR confusion

ChatGPT: systematic currency mix-up

Pattern 06

Recent nominations ignored

DA/CEO <18 months absent in 3/4 LLMs

Pattern 07

Secondary market vs official

DRC, Graff, F.P. Journe pricing errors

Pattern 08

Recent certifications denied

EPV, B Corp 2024–2025 ignored

Pattern 09

CEO <18 months ignored

Krug, Boucheron, Ferragamo, Patou

Pattern 10

Founder death ignored

Roberto Cavalli 4/4 LLMs · Armani 2/4

Pattern 11

DA appointed <12 months absent

Versace, Fendi, Givenchy, Balenciaga

Pattern 12

Hallucinated CEO (gap-filling)

Gemini invents "Ennio Fontana" (Cavalli)

06b — EU ECGT Compliance

ECGT compliance assessment

Every brand in the 2A Agency registry is assessed against the EU Directive 2024/825 — Empowering Consumers for the Green Transition (ECGT) — entering enforcement on 27 September 2026. The assessment evaluates whether LLM hallucinations on a brand create specific legal exposure under the directive.

The ECGT prohibits three categories of environmental claims in B2C communication: generic unsubstantiated claims ("sustainable", "eco-responsible"), carbon neutrality claims based on offsets, and future performance promises without verifiable plans. When LLMs attribute such claims to a brand — accurately or not — they create exposure that requires documented, certified correction.

✓

ECGT Compliant

No ECGT exposure detected

No LLM hallucinations on sustainability claims. Certified data covers manufacturing origin, certifications and RSE commitments accurately. No generic claims fabricated by LLMs.

⚠

ECGT At Risk

Moderate exposure — correction recommended

LLMs produce partially incorrect sustainability data — manufacturing origin confused, certifications simplified or outdated. Legal exposure under ECGT if uncorrected before September 2026.

✗

ECGT Critical

High exposure — immediate action required

LLMs fabricate certifications the brand does not hold (Bio, Organic, Carbon Neutral). Every hallucinated certification constitutes a potential ECGT violation attributable to the brand in the consumer journey.

—

ECGT Pending

Insufficient certified data

The node lacks sufficient certified RSE and sustainability data to assess ECGT exposure. A full audit including sustainability claims is required before the September 2026 deadline.

Status	Count	ECGT Exposure	Recommended Action
Compliant	83	None detected	Monitor via SENTINEL every 48h
At Risk	14	Moderate — certification or origin drift	Certify sustainability data before Sept 2026
Critical	3	High — fabricated certifications in LLM responses	Immediate correction + legal documentation required

The ECGT compliance field is embedded in every certified node JSON at ecgt_compliance.status and accessible via the public API at 2aagency.com/api/nodes/index.json.

07 — Full Process

From question to certified node

A complete brand audit follows seven steps, from initial LLM querying to publication of the certified node JSON and registry page.

1

Simultaneous LLM queries

All 6 questions are asked to all 4 LLMs simultaneously, with identical phrasing and no additional context. Responses are captured in full.

2

Primary source verification

Each factual claim across 24 responses is cross-checked against primary sources. Brand websites, group filings, official press releases, and registries.

3

Drift identification and classification

Every discrepancy between LLM response and verified data is classified by field, LLM, severity (critical/high/medium/low), and pattern category.

4

Integrity score calculation

Score calculated per LLM based on severity-weighted drifts. Final brand score is the average across 4 LLMs, rounded to nearest integer.

5

Certified node JSON generation

A structured JSON node is generated containing certified_data, hallucination_warnings, llm_scores, and metadata. Published to /node/ on the registry.

6

Registry publication

Report page and registry entry published at 2aagency.com. Node added to brand-data.ts for MCP serving. HuggingFace dataset updated.

7

Periodic re-audit

Brands are re-audited at 90-day intervals or immediately following a major governance or pricing change. Scores and nodes are versioned and dated.

How we auditbrand integrity