In this article I’m going to talk about the impact that search engine crawlers and automated robots activity not being controlled can have on your account’s resource usage.
Why do bots need to be controlled?
A very common way that information spreads on the Internet is by the use of automated robots that crawl the Internet to find and index new content. This is great if you want to have your website content found on one of the large search engines such as Google or Bing, but at the same time these are automated bot visitors to your website, and they act much differently than a human visitor would.
A human coming to your website will more than likely take some time to read the current page they’ve landed on, then if they decide to click on another link on your website, it’s going to probably be because it piqued their interest.
A bot coming to your website is typically on a mission to find everything on your website, so they might start on the front page, and then simply spider out to each and every link on your website one after another till they’ve found everything.
So one bot visitor, could potentially have the resource usage impact of hundreds, if not thousands of normal human visitors on your account. An extreme amount of resource usage coming from your account can eventually lead to an account suspension, so it’s important to realize that typically this could be avoided if you were only allowing human visitors, and only select good bots onto your website.
How do I control robots?
Luckily most rule abiding robots out there will follow a standardized robots.txt rules file, you can read more about how to stop search engines from crawling your website which reviews the robots.txt file. You can also read about setting a crawl delay in Google webmaster tools which has specific steps for how Google’s crawling robot can be controlled.
Unfortunately not all automated robots are going to comply with your robots.txt rules, and in these cases it’s best to learn about how to block unwanted users from your site using .htaccess which will allow you to make sure these bad robots don’t add to your account’s resource usage.