I did a pretty detailed reply on LinkedIn so I’ll copy paste it here for full context:
From: Rohit Singh
Daniel Cheung few problems here – dataset not shared neither anything on approach. Only results are shared to make a claim.
I am not saying claim by Dan Petrovic is incorrect. But if a claim is made it, it should get independently verified.
I am not saying by me, anyone can do it.
Few questions to ask –
1) How were the 7,060 queries selected? If queries were hand-picked or concentrated in specific domains (e.g., technical, news, e-commerce, etc.), the findings may not generalize to all search types .
2) How were “grounding words” matched to original page content? Whether exact string matching, fuzzy matching, or semantic similarity was used significantly affects measurement accuracy .
3) Were confounding variables controlled (page authority, freshness, structure)? The “density beats length” conclusion assumes content length is the primary variable, but other factors like domain trust or formatting could drive the results.
4) Why no confidence intervals or significance tests for the “~2,000 word budget” claim? The data shows substantial variance but no statistical testing validates whether this represents a true fixed budget or random variation.
Dan Petrovic
1. Several clients: health, travel, finance, marketing, sports, b2b, marketplace, gambling… perhaps a few industries I forgot. First I define primary entities and then expand them to an arbitrary number of prompts, each prompt is mined via google search enabled grounding tool API call, all metadata collected and saved (fanouts, grounded chunks, grounding urls, confidence scores…etc).
2. I observe actual grounding snippets supplied to the model as context before it synthesizes its answers. No fuzzy matching the segments are exact with some minor goofs. They map cleanly to page source text as it’s extractive and not abstractive summarization.
3. No.
4. ~2,000 is a median. p95: 2,798 it goes up to ~5,000 and one sample with ~30,000 but I think that’s a bug in my pipeline.
I can’t share the data in public for two reasons:
1. client data
2. (can’t tell the 2nd reason or I’d be revealing it)
If you’re interested in peer-review analysis I’ll share with you directly.
I did a pretty detailed reply on LinkedIn so I’ll copy paste it here for full context:
From: Rohit Singh
Daniel Cheung few problems here – dataset not shared neither anything on approach. Only results are shared to make a claim.
I am not saying claim by Dan Petrovic is incorrect. But if a claim is made it, it should get independently verified.
I am not saying by me, anyone can do it.
Few questions to ask –
1) How were the 7,060 queries selected? If queries were hand-picked or concentrated in specific domains (e.g., technical, news, e-commerce, etc.), the findings may not generalize to all search types .
2) How were “grounding words” matched to original page content? Whether exact string matching, fuzzy matching, or semantic similarity was used significantly affects measurement accuracy .
3) Were confounding variables controlled (page authority, freshness, structure)? The “density beats length” conclusion assumes content length is the primary variable, but other factors like domain trust or formatting could drive the results.
4) Why no confidence intervals or significance tests for the “~2,000 word budget” claim? The data shows substantial variance but no statistical testing validates whether this represents a true fixed budget or random variation.
Dan Petrovic
1. Several clients: health, travel, finance, marketing, sports, b2b, marketplace, gambling… perhaps a few industries I forgot. First I define primary entities and then expand them to an arbitrary number of prompts, each prompt is mined via google search enabled grounding tool API call, all metadata collected and saved (fanouts, grounded chunks, grounding urls, confidence scores…etc).
2. I observe actual grounding snippets supplied to the model as context before it synthesizes its answers. No fuzzy matching the segments are exact with some minor goofs. They map cleanly to page source text as it’s extractive and not abstractive summarization.
3. No.
4. ~2,000 is a median. p95: 2,798 it goes up to ~5,000 and one sample with ~30,000 but I think that’s a bug in my pipeline.
I can’t share the data in public for two reasons:
1. client data
2. (can’t tell the 2nd reason or I’d be revealing it)
If you’re interested in peer-review analysis I’ll share with you directly.