Google’s John Mueller defined the distinction between clustering and canonicalization inside Google Search. He mentioned, “Clustering is principally taking the pages that we predict are the identical. After which canonicalization is, from these pages, which one is the perfect one.” John mentioned this on the 3:03 minute mark of the interview.
This got here up within the wonderful Search Off The File interview of Allan Scott from the Google Search workforce, who works particularly on duplication inside Google Search. Martin Splitt and John Mueller from Google interviewed Allan.
Allan defined originally of the video, “when folks suppose canonicalization, they type of think about this one black field that does all of the magic issues collectively. And it is very troublesome to deal with requests from folks which are like, “Nicely, why is canonicalization flawed?” And so I are inclined to push folks to think about it as, properly, canonicalization is one step. It is I’ve a bunch of URLs, and I wish to know which ones is the canonical, however there are different steps which are as, if no more essential right here, like the primary one being clustering.”
Allan continued to elucidate, “Normally, when folks come to us and complain about canonicalization, the instant factor we are saying is, “Oh, that is a clustering drawback, as a result of these two pages should not be in the identical cluster, not to mention instances of canonical choice.” Like, if you wish to carry a canonicalization drawback to me, what that’s, is these two pages are in the identical cluster, however they don’t seem to be really, like we picked the flawed one. Essentially the most dire case being a hijacking, we see these and we act actually quick as a result of these are simply disasters.”
That’s when John Mueller stepped in to summarize the reply as, “Clustering is principally taking the pages that we predict are the identical. After which canonicalization is, from these pages, which one is the perfect one? Is that about proper?” By which Allan responded, “Precisely. Sure.”
Then Alan gave this instance, “So, for instance, rel=”canonical” is a little bit of a magic issue that crosses each these traces. rel=”canonical” will really first attempt to put two pages in the identical cluster. It might or might not succeed, but when two pages are in the identical cluster and there’s a rel=”canonical” between them, then it is also a canonical choice sign.”
This began just about originally of this video if you wish to hearken to it:
Discussion board dialogue at X.