Blogs menu:

Blogs

You are not logged in! Please create an account and log in to blog.


Web crawlers and UCW Ajax pages

By admin (2008-05-11 18:41)

I exist now on the web! I can say that because I get Web crawlers hits regularly. After a while, I was surprised at the large number of hits received from them in a short time.

A bit of log analysis showed mostly entries like that:

66.249.67.212 "GET /dynamic/ajax?_a=_LjxGq... HTTP/1.1" 200 179 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; \+http://www.google.com/bot.html)"
These entries can only come from http://ucw.opopop.net as I don't use Ajax yet for http://lisp.opopop.net.

I suspect that the bot is a bit too smart and clicks happily on everything, in the hope of ending at a "real page". Which won't necessarily happen due to the nature of UCW demos, so the bot enters an endless loop.

There are no entries like that for other crawlers.

To avoid filling my logs with nonsense, I added the following 'robots.txt' file at the root of the static part of the Web site:

User-agent: *
Disallow: /dynamic/

And of course, a browser reference to that file must not trigger UCW, so my Apache2 site configuration is now:

<IfModule mod_lisp2.c>
    LispServer 127.0.0.1 3001 "UCW"
    SetHandler lisp-handler
    <LocationMatch ".*/static/.*\.(css|jpg|gif|png|js|ico|html)$">
       SetHandler none
    </LocationMatch>
    <LocationMatch "/robots.txt$">
       SetHandler none
    </LocationMatch>
</IfModule>

Permalink