Caching: good or evil?
2008/10/30 filed under /webCaching is the art of storing a certain file for easy retrieval later, or at least, that's how I see it. The benefits for the user are clear: whatever you're caching makes the process of requesting it another time later a lot faster.
Search engines offer you cached versions of webpages as well, and according to CNET, Google offered this since 1997 (and the others probably introduced it around the same time).
To me, search engines take the wrong approach and courts all around the globe seem to find it a difficult question as well. When is someone infringing a copyright? After Google losing lawsuits in Belgium over indexing news articles and in Germany over image search "caching", today a US court ruled in favor of Yahoo! and Microsoft's caching.
Besides the fact that Gordon Ray Parker probably has nothing else to do than suing, and besides the fact that he probably fully knew how to opt-out, this lawsuit makes me very unhappy.
Why is it so bad when the big search engines are allowed to cache your content? It's quite easy, because they're making money with your content. The whole idea behind caching webpages to me sounds like an attempt to have the visitor stay at the search engine's page a little longer, and thus more chances that he's going to click on an ad (= cash).
The fact that they have an opt-out mechanism, based on a draft that expired in 1997, doesn't make it any more logical to me that it's allowed to "steal" data for profit.
Let's scale it down. Every webmaster/blogger has at some point noticed that his/her text was taken completely out of context and dumped on some other blog, surrounded by nothing but ad-sense ads. That's truly annoying. The operator of that site hopes to gain some traffic from SEO and hopes for people to click on some links (which they'll do, for the copied text makes no sense).
According to the ruling, that is all ok now, unless you specifically tell the crawlers that they can't do that. So now the burden is on the shoulders of all the webmasters in the world. Configure your robots.txt correctly. Not that it helps a lot, for the annoying blogs mentioned above probably ignore it to begin with, but ok. Why should everybody with a website explicitly tell all potential crawlers to keep their filthy claws of their property? Why can't there be a standard where you specifically set up a document that describes what crawlers are allowed to grab/cache and do whatever with as they see fit? Because it'll destroy Google, Yahoo! and Microsoft (and other search engines, of course). But who really cares?
Again, $(big companies that claim to be non-evil)++ vs $(rest of the world)--.
Sad, very sad. Maybe I should start scraping Google's results, but oh no, the don't allow you, nor does Yahoo!, nor does MSN



Comments