Welcome to aegis hosting

Our Support Portal puts you in control of your account with us. Manage your hosting packages, servers and domains. Keep track of support issues, orders and billing.

Support Portal

Search Engine bots using up bandwidth

In order for a search engine to find your site, it uses something commonly called a 'bot'. A bot is a robot user that periodically trawls through every link on your site looking for relevant info. Usually that's a good thing but some bots are a little overzealous.

For instance, recently we had a user whose site bandwidth was being eaten up at the rate of 4GB a week by a Russian search engine bot from yandex.ru. Yandex.ru is larger than Google in Russia but their search engine was eating bandwidth trawling through a site that would be of little interest to Russians.

Here's how we block it.

First we create a robots.txt file following the instructions at http://www.robotstxt.org/robotstxt.html. There's lots of info about bots at robotstxt.org that is useful and usually a robots.txt is enough.

But, the Yanex search bot ignores robots.txt.

Next we add the following to .htaccess in the public_html web root

# mod_rewrite on
Options +FollowSymLinks
RewriteEngine on
RewriteBase /

# Block bots we don't care about
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteRule ^/.* [F]

This works by looking at the 'user agent' of the incoming bot and giving it a FAIL result if it matches 'Yandex'. It relies on Apache's mod_rewrite rules to work.



Was this answer helpful?

Add to Favourites Add to Favourites    Print this Article Print this Article