Google's broken date recognition

I don’t exactly know when it happened (probably an effect of the “May update” Michael Grey spotted the date problem during April), but Google has clearly got some problems with how they are currently deciding when a page was published.

Trick Google

Simon Sundén pointed out two weeks ago in this article on his Swedish blog that it was easy to trick google into showing any date you wanted in search result pages. Simon suggested that Google was giving extra weight to dates in titles and main headings. But Google’s problems appear to be even more wide-spread.

Google’s algorithm is currently making some really poor guesses as to the published dates of certain articles. Hans Kullin has today spotted that Google is changing correct dates in their search results for old articles from Swedish newspaper Aftonbladet to incorrect dates based on the date they happen to re-index the page.

Aftonbladet example

Let’s take this Aftonbladet article from March 2008 – Bojkotta inte Kina-OS!.

Screenshot of Aftonbladet

You can see from the date in the above picture that Aftonbladet are clearly saying that the article was published on the 20th March 2008.

Screenshot of Google SERP

When we search for that article, Google is telling us that it was published on the 27th of May 2010 (yesterday at the time of writing this).

Screenshot of source code

Why though? Well, the first date that Google reaches when indexing the html of that article is indeed the 27th of May (as you can see in the above image). The date the article was published comes later on further down in the code. In addition, today’s date is repeated a second time in the code towards the bottom of the page.

The most reliable date?

Screenshot of the trigger date in the Aftonbladet menu

Aftonbladet are showing today’s date at the very top of their left hand navigation. (and by the side of their search box in the page-footer) Google’s current broken way of establishing the date that an article was published is seeing this date and deciding that it is the most reliable date on the page.

Exploiting the problem

Hopefully Google will fix this. Given the importance and weight of recently published content, we’re going to see a lot of people exploiting this problem with Google’s date calculation algorithm in order to push their old content back up the search result pages.