high resource usage notice
The "high resource usage" I am receiving on my account suddenly since July 9 at bumperpress.com is from a Chinese bot which is hitting me with 468,000 kbytes several times every minute. This is not my normal usage which is not very much. I have added several lines of code to my robots.txt and .htaccess but cannot get rid of this nuisance from "ptr.cnsat.com.cn" A chinese spammer bot. Anyone have any idea what I can do besides blocking their ip address which don't seem to work for long? They keep coming back! Thanks!
3077 Mozilla/5.0 (compatible; spbot/4.1.0; +http://OpenLinkProfiler.org/bot ) 2353 Mozilla/5.0 (compatible; EasouSpider; +http://www.easou.com/search/spider.html) 2099 Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots) 1800 Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html) 1290 Mozilla/5.0 (compatible; freefind/2.1; +http://www.freefind.com/spider.html) 873 Mozilla/5.0 (compatible; 007ac9 Crawler; http://crawler.007ac9.net/) 612 Mozilla/5.0 (compatible; DotBot/1.1; http://www.opensiteexplorer.org/dotbot, [email protected]) 452 Mozilla/5.0 (compatible; Plukkie/1.5; http://www.botje.com/plukkie.htm) 324 Baiduspider-image+(+http://www.baidu.com/search/spider.htm) 83 Mozilla/5.0 (compatible; YandexImages/3.0; +http://yandex.com/bots) 80 Mozilla/5.0 (compatible; MJ12bot/v1.4.5; http://www.majestic12.co.uk/bot.php?+) 58 Mozilla/5.0 (compatible; AhrefsBot/5.0; +http://ahrefs.com/robot/) 54 NerdyBot 40 Mozilla/5.0 (compatible; 200PleaseBot/1.0; +http://www.200please.com/bot) 28 Mozilla/5.0 (compatible; SISTRIX Crawler; http://crawler.sistrix.net/)
If you for instance wanted to block all these bots outright, here is a .htaccess rule you could use:BrowserMatchNoCase "spbot" bots BrowserMatchNoCase "EasouSpider" bots BrowserMatchNoCase "YandexBot" bots BrowserMatchNoCase "Baiduspider" bots BrowserMatchNoCase "freefind" bots BrowserMatchNoCase "007ac9" bots BrowserMatchNoCase "DotBot" bots BrowserMatchNoCase "Plukkie" bots BrowserMatchNoCase "MJ12bot" bots BrowserMatchNoCase "AhrefsBot" bots BrowserMatchNoCase "200PleaseBot" bots BrowserMatchNoCase "SISTRIX Crawler" bots Order Allow,Deny Allow from ALL Deny from env=bots
I also noticed that you have deny from rules in your .htaccess file in this format:Deny from 157.55.*
You don't actually need the asterisk *:Deny from 157.55
Also you typically don't want to block based off of the PTR address or hostname in your .htaccess file. Rather a direct IP address, or in the case of the Chinese Baidu crawler simply blocking them by their User-agent is more effective. It looks like since we blocked specific requests with &user=202.46 in the URL with this code:ErrorDocument 503 "Temporarily unavailable" RewriteEngine on RewriteCond %{QUERY_STRING} ^.*user=202.46.*$ RewriteRule .* - [R=503,L]
Your site has blocked 208 of those type of requests so far today, and it looks like your resource usage has dropped a bit. If you block some of those bots that you don't need crawling your site, Yandex for instance is a Russian search engine and Baidu is a Chinese one. That can help cut your resource usage even further. As always you can view CPU graphs in cPanel to help ensure that your usage isn't spiking again. Hope that helps, and please let us know if you had any other questions at all! - Jacob