Why Your AI Visibility Score Is Probably Wrong

GEOforge Research · AI Visibility

We asked ChatGPT the same buying question 49 times. The brands it recommended changed in 43% of cases.

By Paris ChildressGEOforge7 min read

65,478

ChatGPT answers analysed

~49×

repeats per prompt

43%

of prompts gave volatile results

12%

named the brand every time

If your AI visibility score came from running a prompt once, you didn't measure your visibility. You measured a single roll of the dice.

What an AI visibility score actually measures

An AI visibility score estimates how often an AI model like ChatGPT mentions your brand when people ask buying questions in your category. It's the GEO equivalent of a ranking: a number that tells you whether you're showing up where buyers are now making decisions.

The catch is that the number assumes the answer is stable. Ask the question, read the answer, record whether you're in it. But LLM answers aren't fixed. The same prompt, asked again a minute later, can return a completely different set of brands. So the question isn't just "am I in the answer?" It's "in how many of the answers?"

We ran 65,478 ChatGPT answers to find out

Across 21 brands and 569 real buyer prompts, we ran each prompt roughly 49 times and recorded whether the brand appeared in each individual answer. That gave us 1,302 prompt-instances and a simple test: when you ask the same thing 49 times, how consistent is the result?

Not very.

What happens when you ask the same prompt ~49 times

45%

43%

12%

Brand never appeared Appeared in some runs, not others Appeared every time

Consistency across ~49 identical runs
Outcome	Share of prompts
Brand never appeared	45%
Brand appeared in some runs but not others	43%
Brand appeared in every run	12%

Only 12% of prompts produced a reliable, always-on mention. Among the prompts where the brand showed up at all, 78% were inconsistent. And for 290 of the volatile prompts, the brand appeared in fewer than one run in four. Present just often enough to turn up in a lucky screenshot, absent the other three times out of four.

Why does ChatGPT give different answers to the same question?

Because large language models are probabilistic, not deterministic. Each answer is sampled, not retrieved from a fixed table, so there's built-in run-to-run variation in which entities get named. Layer live web retrieval on top, where the sources pulled can shift between requests, and a single answer becomes a snapshot of one moment, not a stable fact about your brand.

This is normal model behaviour. The problem isn't that ChatGPT is inconsistent. The problem is measuring it as if it weren't.

What this means for your visibility score

A single-snapshot score is a sample size of one. If you happened to catch one of the runs where you appeared, your tool reports you as "visible" and you feel good. If you caught one of the runs where you didn't, it reports zero and you panic. Both readings can come from the exact same prompt on the exact same day.

Across 1,302 repeated prompt tests, ChatGPT named the same brands inconsistently 43% of the time. A single-snapshot AI visibility score measures noise, not visibility.

That's the trap with most AI visibility numbers floating around right now. They're real measurements of an unstable signal, reported with false precision.

So how many times should you measure?

Enough that the number stops moving. A single run tells you almost nothing; a few dozen runs per prompt converge on a stable mention rate you can actually track over time. Our own measurement layer runs 35 to 50 repeats per prompt and has logged more than 85,000 ChatGPT calls to keep the numbers statistically usable. The goal isn't a bigger number. It's a number that means the same thing tomorrow as it does today.

How to get an AI visibility number you can trust

Repeat every prompt many times, not once.
Report a range or a mention rate, not a single yes/no.
Re-measure on a fixed schedule so you're comparing like with like.
Separate "never appears" from "sometimes appears." They're different problems with different fixes.
Never celebrate, or mourn, a single screenshot.

Knowing your real score is step one. But a trustworthy number is still just a number. Tracking your visibility is table stakes. Changing it is the work that actually moves a market.

Measurement you can trust. Execution that moves it.

That's the line GEOforge is built on. See where your brand really stands across thousands of AI answers, then close the gap.

Book a GEO visibility audit →

Sources & method. All figures verified against the GEOforge measurement database on 16 June 2026. Corpus: 65,478 ChatGPT answer-runs across 21 tracked brands and 569 categorised buyer prompts, measured April–June 2026. Volatility computed across the ~49 repeated runs per prompt-instance (1,302 instances). Figures are ChatGPT-specific; cross-engine results (Perplexity, Google AI Overviews) may differ.

Why Your AI Visibility Score Is Probably Wrong

What an AI visibility score actually measures

We ran 65,478 ChatGPT answers to find out

What happens when you ask the same prompt ~49 times

Why does ChatGPT give different answers to the same question?

What this means for your visibility score

So how many times should you measure?

How to get an AI visibility number you can trust

Measurement you can trust. Execution that moves it.

platform

Resources

Compare Us

use cases

compare us