Everybody within the search engine optimization neighborhood is speaking about how John Mueller’s web site is now not listed within the Google Search outcomes. It looks as if it was deindexed and faraway from Google fully. The query is why? John Mueller is among the most acknowledged Google Search spokespeople on the web, so for his website, johnmu.com, with a ton of geeky subjects on it, to be not displaying up within the Google Search outcomes – SEOs marvel…
In the event you do a website command for [site:johnmu.com] – no outcomes come up:
In fact, very first thing each search engine optimization does it examine the johnmu.com/robots.txt – so have enjoyable going via that. Then you definately search for different meta tags that may stop it from being listed.
We wouldn’t have entry to John’s Google Search Console to see if there was a guide motion, like so many other sites received last week, however I doubt his website was hit by one…
I noticed this by way of:
.@JohnMu FYI, your web site bought fully deindexed in Google. It appears Google went loopy π±
H/T @seb_skowron & @ziptiedev pic.twitter.com/RGq6GodPsG
β Tomek Rudzki (@TomekRudzki) March 13, 2024
Right here is how John is responding to the response of this info to this point:
@googlesearchc assist me
β John (@JohnMu) March 13, 2024
How one can double the visitors to your website in these 5 straightforward steps!
β John (@JohnMu) March 13, 2024
I undoubtedly don’t forget to EAT, lol π
β John (@JohnMu) March 13, 2024
Inform me what it finds, please!
β John (@JohnMu) March 13, 2024
It is raining on Googlebot. What does it imply?
β John (@JohnMu) March 13, 2024
You may be on to one thing – look what I discovered: pic.twitter.com/Q6Ej0A1O7Z
β John (@JohnMu) March 13, 2024
Right here is Fabrice Canel from the Bing workforce – how humorous:
π
β John (@JohnMu) March 13, 2024
What do you all assume is happening. It appears John is having enjoyable with this one…
Be good please.
Discussion board dialogue at X.
Replace: John posted on LinkedIn extra particulars, he wrote:
My website’s robots.txt file was making the rounds. It is awkward – is not a robots.txt file a bit like an internet site’s underwear? I might have placed on a clear file if I had identified.
However, what’s up with the file? And why is your website deindexed?
Somebody urged it may be due to the hyperlinks to Google+. It is doable. And again to the robots.txtβ¦ it is nice – I imply, it is how I would like it, and crawlers can cope with it. Or, they need to be capable to, in the event that they observe RFC9309.
The touch upon high – that is in fact for you, and a approach of catching a hard-to-spot mistake: a double UTF BOM. Particular textual content file sorts have a particular beginning character. Having one is ok, you often do not want it. Most programs (browsers, editors) conceal it. For robots.txt when you have a directive on high, and you’ve got an unintentional *second* BOM, then that will likely be seen as a part of the directive, and the directive will not be processed. Having a touch upon high signifies that within the worst case, the remark will likely be ignored. That is nice. You might additionally simply have a clean line on high. Or make a clear robots.txt file. Anyway, this can be a put up, not a cop.
“disallow: /robots.txt” – does this make robots spin in circles? Does this deindex your website? No. My robots.txt file simply has plenty of stuff in it, and it is cleaner if it does not get listed with its content material. This purely blocks the robots.txt file from being crawled for indexing functions. I might additionally use the x-robots-tag HTTP header with noindex, however this manner I’ve it within the robots.txt file too.
The size. JOHN. WHAT’S UP WITH THE SIZE OF THIS FILE? I’m purposely refraining from making any jokes, don’t assume them in your head. That is Linkedin, we’re right here for srs bzns, of us. NO JOKES. The scale comes from exams of the varied robots.txt testing instruments that my workforce & I’ve labored on. The RFC says a crawler ought to parse at the very least 500 kibibytes (bonus likes to the primary one who explains what sort of snack that’s). It’s important to cease someplace, you possibly can make pages which can be infinitely lengthy (and I’ve, and many individuals have, some even on objective). In observe what occurs is that the system that checks the robots.txt file (the parser) will make a minimize someplace. I added a “disallow: /” on high of that part, so hopefully that will get picked up as a blanket disallow. It is doable that the parser will minimize off in a clumsy place, like a line that has “enable: /cheeseisbest” and it stops proper on the “/”, which might put the parser at an deadlock (and, trivia! the enable rule will override when you have each “enable: /” and “disallow: /”). This appears not possible although. In observe, parsers that must undergo it will ship me lightning bolts with their robotic eyes. And cease crawling, in the event that they’re well mannered. There are plenty of crawlers which can be rude or that placed on masks after they crawl, that is a subject for an additional day although.
There you will have it – some robots.txt quirks – now go away my robots.txt alone π
And, what’s your favourite net quirk?
John implied the positioning ought to come again fast, he wrote, “I used the Search Console instrument to attempt one thing out. I’d make a quick restoration if I hit the appropriate button :-).” So the pages are within the index however they’re hidden, like whenever you use the URL elimination instrument.