Block Bad Bots and Spiders Using .htaccess

In setting up this site, I discovered that unscrupulous people set up bots to harvest email accounts, as well as my content. Yes, I have found exact copies of my stuff on other sites. This is by no means foolproof, but it will stop a good number of these bots, while letting the good bots, such as googlebot to spider your site.

What I have been using lately is WordPress, which is what this site uses. For a typical WordPress¬†installation I install it in a folder other than the websites root folder. By the way, this will work for any site with an .htaccess file. For this situation, the .htaccess file is usually in the website”s main folder. As always, make a backup of your .htaccess file before you make any changes/ You have been warned.

I like to download my htaccess file to my computer and edit it there. Here is a portion of a typical htaccess file:

RewriteEngine on

# -FrontPage-

IndexIgnore .htaccess */.??* *~ *# */HEADER* */README* */_vti*

<Limit GET POST>
order deny,allow
deny from all
allow from all

We are interested in the part of the file with the  <Limit GET POST>.This is the area that we are going to modify. First of all, the line:order deny,allow needs to be changed to: order allow,deny. Next, we need to change the line deny from all toDeny from env=bad_bot.