nginxとrobots.txtを使って検索クローラーからのアクセスを制限する方法

1.設定内容:
http {
map $http_user_agent $agent {
default “";
~curl $http_user_agent;
~*apachebench $http_user_agent;
~*spider $http_user_agent;
~*bot $http_user_agent;
~*slurp $http_user_agent;
}
limit_conn_zone $agent zone=conn_startnews24_com:10m;
limit_req_zone $agent zone=req_startnews24_com:10m rate=1r/s;

server {
listen 8092;
server_name crawl.arkgame.com;
root /data/webroot/www.arkgame.com/

location / {
limit_req zone=conn_startnews24_com burst=5;
limit_conn req_startnews24_com 1;
limit_rate 500k;
}
}
}

2.テスト:
# ab -c 10 -n 300 http://crawl.arkgame.com:8092/www.arkgame.com.html

Nginx

Posted by arkgame