Caching is the art of storing a certain file for easy retrieval later,
or at least, that's how I see it. The benefits for the user are clear:
whatever you're caching makes the process of requesting it another time
later a lot faster.
Search engines offer you cached versions of webpages as well, and
according to CNET,
Google offered this since 1997 (and the others probably introduced it
around the same time).
To me, search engines take the wrong approach and courts all around the
globe seem to find it a difficult question as well. When is someone
infringing a copyright? After Google losing lawsuits in Belgium over indexing
news articles and in Germany over image search "caching", today a US court
ruled in favor of Yahoo! and Microsoft's caching.
Besides the fact that Gordon Ray Parker probably has nothing else to do than
suing, and besides the fact that he probably fully knew how to opt-out,
this lawsuit makes me very unhappy.
Why is it so bad when the big search engines are allowed to cache your
content? It's quite easy, because they're making money with your content.
The whole idea behind caching webpages to me sounds like an attempt to
have the visitor stay at the search engine's page a little longer, and thus
more chances that he's going to click on an ad (= cash).
The fact that they have an opt-out mechanism, based on
a draft
that expired in 1997, doesn't make it any more logical to me that
it's allowed to "steal" data for profit.
Let's scale it down. Every webmaster/blogger has at some point noticed
that his/her text was taken completely out of context and dumped on
some other blog, surrounded by nothing but ad-sense ads. That's
truly annoying. The operator of that site hopes to gain some traffic
from SEO and hopes for people to click on some links (which they'll
do, for the copied text makes no sense).
According to the ruling, that is all ok now, unless you specifically
tell the crawlers that they can't do that. So now the burden is on the
shoulders of all the webmasters in the world. Configure your robots.txt
correctly. Not that it helps a lot, for the annoying blogs mentioned
above probably ignore it to begin with, but ok. Why should everybody
with a website explicitly tell all potential crawlers to keep their
filthy claws of their property? Why can't there be a standard where
you specifically set up a document that describes what crawlers
are allowed to grab/cache and do whatever with as they see fit?
Because it'll destroy Google, Yahoo! and Microsoft (and other
search engines, of course). But who really cares?
Again, $(big companies that claim to be non-evil)++ vs $(rest of the world)--.
Sad, very sad. Maybe I should start scraping Google's results, but oh no,
the don't allow you,
nor does Yahoo!,
nor does MSN