In order for a search engine to find your site, it uses something commonly called a 'bot'. A bot is a robot user that periodically trawls through every link on your site looking for relevant info. Usually that's a good thing but some bots are a little overzealous.
For instance, recently we had a user whose site bandwidth was being eaten up at the rate of 4GB a week by a Russian search engine bot from yandex.ru. Yandex.ru is larger than Google in Russia but their search engine was eating bandwidth trawling through a site that would be of little interest to Russians.
Here's how we block it.
First we create a robots.txt file following the instructions at http://www.robotstxt.org/robotstxt.html. There's lots of info about bots at robotstxt.org that is useful and usually a robots.txt is enough.
But, the Yanex search bot ignores robots.txt.
Next we add the following to .htaccess in the public_html web root
# mod_rewrite on
Options +FollowSymLinks
RewriteEngine on
RewriteBase /
# Block bots we don't care about
RewriteCond %{HTTP_USER_AGENT} Yandex [NC,OR]
RewriteRule ^/.* [F]
This works by looking at the 'user agent' of the incoming bot and giving it a FAIL result if it matches 'Yandex'. It relies on Apache's mod_rewrite rules to work.