Search engine news, watched from a height.

Standards still matter when search is mostly inferring

Embeddings and AI overviews don't replace the boring layer underneath. They depend on it more than anyone wants to admit.

Every few months a new piece comes out arguing that classical SEO is over. The model will read your page. The model will figure out what it's about. Canonical tags, sitemaps, hreflang, schema, all of it. Quaint. The future is just writing things and letting inference handle the rest.

This is half right, which is more dangerous than fully wrong.

The half that's right: a competent language model genuinely can guess most of what your page is, even if you've left the standards layer a smoking ruin. It can recover from a missing meta description. It can tolerate a sitemap that hasn't been touched since 2019. It will cheerfully infer a primary topic from your title and your first paragraph and move on.

The half that isn't: search systems are not one model reading one page. They are pipelines, and the pipeline still runs on the boring layer.

Inference rides on top of crawl

Before any model touches your content, a crawler decides what to fetch, how often, and what to prioritize. Sitemaps, internal links, canonical signals, response codes, robots directives — all of that lives upstream of the part everyone wants to talk about. If the crawler picks the wrong canonical, the model gets the wrong page. If the crawler can't find a section of the site, the model never sees it. No amount of downstream inference fixes a starvation problem upstream.

This is the part that gets lost when the conversation collapses to "Google is using LLMs now." Yes. And those LLMs are reading inputs that were selected, deduplicated, and ranked by systems that mostly haven't changed shape in fifteen years.

The standards are a coordination protocol

Every standards-layer piece — schema, OpenGraph, hreflang, structured data, robots — does the same job. It's a way for two strangers (a publisher and a search system) to agree on what something is without negotiating in plain English. It's faster, cheaper, and unambiguous. Inference is the fallback for when that agreement fails, not the replacement for it.

You can model this as a cost question. A correct schema declaration costs the publisher near zero and saves the search system the cost of guessing. The standards layer exists because the alternative — every system inferring everything from scratch — is wasteful at internet scale, even with the cheap models.

What this means in practice

Don't let "AI overviews are eating the SERP" become an excuse to stop maintaining the layer beneath them. The overviews are reading the same pages, fetched by the same crawler, ranked by the same retrieval systems, that everything else has been reading. Your canonical tag still routes traffic. Your sitemap still tells the bot which URLs to revisit. Your structured data still tells aggregators what the thing on the page actually is.

The work hasn't gotten less important. It's gotten less visible.