Matt Cutts on How Google Handles 404 & 410 Status Codes

If you're into super technical details regarding Google's Web crawling and how they interact with different status codes, you'll probably be interested in the new webmaster help video regarding the differences between how Google handles 404 and 410 status codes. While technically they both mean "page not found," Matt Cutts talks about the nuances of each and how Googlebot treats each slightly differently.

For those who aren't too technically savvy, Cutts first explains what the difference is between a 404 and a 410, because most webmasters are far more familiar with the 404 status code.

"So 404 vs. 410 refers to an HTTP status code, so whenever the browser or Googlebot asks for page, the Web server sends back a status code – 200 might mean everything went totally fine, 404 means page not found, 410 typically means gone, as in the page is not found and we do not expect it to come back," Cutts said. "So 410 has a little more of connotation that the page is permanently gone."

That said, does Googlebot interact any differently when they encounter a 410?

"The short answer is that we do sometimes treat for 404s and 410s a little bit differently, but for the most part you shouldn't worry about it," Cutts said. "If a page is gone and you think it's temporary, go ahead and use a 404. If the page is gone and you know no other page that should substitute for it, you don't have anywhere else that you should point to, and you know that that page is going to be gone never come back, then go ahead and serve a 410."

On the positive side, Googlebot does have some redundancies built in, for when a webmaster or IT department makes a mistake in how they status codes.

"It turns out webmasters shoot themselves in the foot pretty often – pages go missing, people misconfigure sites, sites go down, people block Googlebot by accident, people block regular users by accident – so if you look at the entire Web, the crawl team has to design to be robust against that," Cutts said. "So with 404s, along with I think 401s and maybe 403s, if we see a page and we get a 404, we are gonna protect that page for 24 hours in the crawling system, so we sort of wait and we say maybe that was a transient 404, maybe it really wasn't intended to be a page not found."

"If we see a 410, then the site crawling system says, OK we assume the webmasters knows what they're doing because they went off the beaten path to deliberately say this page is gone," he said. "So they immediately convert that 410 to an error, rather than protecting it for 24 hours.

So when you do serve a 410 status code on a page that really isn't gone permanently, you haven't killed that page off permanently. Googlebot will return the check and see if the page needs to be returned to the index.

"Now don't take this too much the wrong way, we'll still go back and recheck and make sure are those pages really gone, or maybe the pages have come back alive again," Cutts said. "And I wouldn't rely on the assumption that that behavior will always be exactly the same.

"In general, sometimes webmasters get a little too caught up in the tiny little details and so if the page is gone, it's fine to serve a 404, if you know it's gone for real it's fine to serve a 410," he said. "But we'll design our crawling system to try and be robust so that if your site goes down, or if you get hacked, or whatever that we try to make sure that we can still find the good content whenever it's available."

So this is one of those things where it's a tiny little detail that webmasters probably shouldn't be overly concerned about. They are treated nearly identically, but if in doubt, the more common 404 route is probably the best way to go.

No comments:

Post a Comment