A single AI visibility audit has a margin of error of around ±13 points. GEOforge's is ±1.5. Here's the math — and why it matters more than you might think.
Ask an AI tool the same question twice and you'll often get two different answers. Your brand might be named the first time and missing the second. AI answers are generated fresh every time — they are not a fixed lookup. Which means a measurement of "how visible is my brand in AI search?" is only as good as how many times you asked.
Ask once, and you've got an anecdote. Ask enough times, and you've got a metric you can put in a board deck. Most AI visibility tools on the market ask each question once — or a small handful of times. GEOforge asks each question 50 times, across 30 different buyer questions, for every brand, every week. Here's why that gap matters — with the actual numbers.
Every measurement that isn't infinite has some wiggle room. The honest way to express that is a margin of error — a ± figure around the number. "12% ± 1 point" means the real answer is almost certainly between 11% and 13%. Tight. You can act on it. "12% ± 9 points" means the real answer is somewhere between 3% and 21%. That's not a measurement — it's a guess with a number attached.
The whole game is getting that margin of error small enough to trust. And the only lever is depth — how many answers you collect.
Here is the margin of error on a visibility rate, by how many times each question is asked (averaged across the brands we measure):
| How each question is asked | Total answers collected | Margin of error | Verdict |
|---|---|---|---|
| Once (typical quick audit) | ~30 | ± 13 points | An anecdote |
| 5 times | ~135 | ± 5.7 points | Still very rough |
| 10 times | ~270 | ± 3.9 points | Getting usable |
| 30 times | ~810 | ± 2.1 points | Solid |
| 50 times (GEOforge default) | ~1,500 | ± 1.5 points | Trackable week-to-week |
Read the top and bottom rows together. A one-pass audit is off by ±13 points. Ours is off by ±1.5. That's not a small refinement — it's the difference between "I think we're around 12%" and "we're 12%, and we'll know if that moves to 14% next month."
A single AI visibility check has a margin of error of ±13 points. GEOforge's is ±1.5. One is an anecdote. The other is a metric you can build a strategy on.
Take a brand that genuinely shows up in 5% of AI answers.
5% ± 9 points
True value is somewhere between 0% and 14%. You literally cannot tell if this brand is invisible or a category leader.
5% ± 1.1 points
True value is between 3.9% and 6.1%. A number you can report, benchmark, and track week over week.
The only difference is how hard we looked — and that difference decides whether the number is usable at all.
Imagine estimating how often a coin lands heads. Flip it 10 times, you might get 7 heads and conclude 70% — obviously just luck. Flip it 1,000 times and you'll land right on the true 50%.
A brand appearing in an AI answer is exactly like a coin flip that rarely comes up heads. And the rarer the event, the more flips you need to pin it down. Most brands appear in AI answers only a few percent of the time — precisely the situation where shallow sampling fails hardest and depth matters most.
For every brand, every measurement cycle, on each AI source (ChatGPT, Google AI Overviews, Google AI Mode):
To date, this methodology covers over 76,800 AI answers across 12 brands and 61 weekly cycles — and it grows every week.
| Typical AI visibility check | GEOforge | |
|---|---|---|
| Questions asked | 1, or a handful | 30 buyer questions |
| Times each is asked | Once | 50 times |
| Answers behind a number | A few dozen | ~1,500 per brand per source |
| Margin of error | ±9–13 points | ±1.5 points |
| Uncertainty shown? | No — single number | Yes — 95% confidence range |
| Track week-to-week movement? | No — noise swamps the signal | Yes |
| Competitor landscape | Whatever one answer happened to name | Full set, with how often each appears |
We're not claiming a fancier formula. We're claiming we did the work — we asked enough times that the number means something. That is the entire difference between a screenshot of one ChatGPT answer and a metric you can build a strategy on.
We report ranges, not false precision, because AI answers never fully settle. There's no number of runs that captures every possible answer — so we estimate carefully and show the uncertainty rather than pretend it away.
When a brand is genuinely near-invisible (showing up in a handful of answers), no amount of measuring makes it look better — and we say so. Our depth makes us more honest, not less.
Every figure is reproducible from the underlying data. Confidence ranges on the visibility rate use the Wilson method — the standard for rare yes/no rates. Ranges on the share-of-voice score use a standard large-sample approximation, valid because we have ~1,500 data points behind each one. We can show our working. A single screenshot can't.
Get a free GEO analysis — 1,500 answers per source, reported with a real confidence range. Not a screenshot. A measurement.
Get Your Free GEO Report