AI Visibility Software
← Blog
Published AI Visibility editorial

Measuring brand presence across LLM outputs

A baseline methodology for tracking how often and how accurately your brand is mentioned by the major answer engines.

Bottom line

Track prompt families, not keywords. Sample five runs per prompt before drawing a conclusion. Instrument for week-over-week drift, not for a snapshot. Absolute presence numbers matter less than the direction of change for a defined prompt cluster.

Most brands, when they first run an AI visibility audit, discover something uncomfortable: their three biggest competitors come up in more answers than they do, not because those competitors are more relevant, but because those competitors are more described across the training and retrieval sources the models use.

How do I structure prompts for tracking?

Pick fifty prompts a real buyer would type into ChatGPT this week. Don’t paraphrase, use the phrasing. Then cluster by intent (evaluation, pricing, integration, objection) and treat each cluster as a separate tracked unit.

How many runs are enough before drawing a conclusion?

Answer engines are probabilistic. A single response isn’t a signal; five responses, clustered by outcome, is. Any dashboard that reports “your brand appears 30% of the time” from a single query is reporting noise.

Why does drift matter more than a snapshot?

Track week-over-week change on the same prompt family. Absolute presence numbers matter less than direction, a brand going from 18% to 34% over a quarter is the kind of movement that correlates with pipeline.

FAQ

What is a prompt family?

A prompt family is a cluster of variant queries a real buyer would type for the same intent. "Best CRM for a 12-person startup", "compare CRMs for early-stage", and "is HubSpot or Pipedrive better for a small team" are three variants of the same prompt family. Tracking the family, not the individual phrasing, produces a citation rate the analyst can act on.

How many runs do I need per prompt?

Five runs minimum, more for high-entropy prompts (open-ended or low-consensus topics). Answer engines are probabilistic; a single response is one sample from a distribution. Single-run dashboards report noise.

Should I track absolute presence or drift?

Both, but drift is the headline. Absolute share-of-voice is a snapshot; week-over-week change on the same prompt family is a trendline. A brand going from 18% to 34% citation share over a quarter is the kind of movement that correlates with pipeline.

When should I stop tracking a prompt?

When it no longer reflects how a real buyer phrases the question, when your share of voice has saturated near 100% across all engines for several weeks, or when it duplicates another prompt already in the set. Retire those and reinvest the run budget in prompts where the answer is still contested.

Reviewed by

Maya Shapiro

Founder & lead analyst · 15 years in digital marketing

Updated

How we score →

Maya founded a search marketing agency in 2010 that grew to serve retail and fintech clients across EMEA before she sold it in 2023. Fifteen years across SEO, paid search, and analytics: she now spends her days running brand-visibility experiments across ChatGPT, Claude, Gemini, Perplexity, and Copilot. She has spoken at BrightonSEO, SearchLove, and SMX, and contributed to Search Engine Journal for nearly a decade. Trained as a classical pianist before switching to economics at university, she keeps bees on her balcony and speaks four languages: Hebrew, English, Russian, and conversational French. Methodology and affiliate disclosure are documented at /methodology.