HTTrack Website Copier
Free software offline browser - FORUM
Subject: Browser ID changed but still blocked by robots.txt
Author: Maher
Date: 08/24/2013 13:10
 
Hi,

I am using the following settings for my crawler:

user-agent="AnyNewNonExistingNeverUsedBrowserID"
robots=2    # follow robots.txt rules

I am using the crawler on my site, which has the following robots.txt rule:

User-agent: HTTrack
Disallow: /

but nothing gets crawled. When I remove the HTTrack rule from the robots.txt
the site gets crawled.

In both cases I can see in my website log that the user-agent
"AnyNewNonExistingNeverUsedBrowserID" has visited my site.

Question: Although the browser ID was changed, the crawler was identified by
means of robots.txt to be HTTrack and not
"AnyNewNonExistingNeverUsedBrowserID". Is there any way prevent this without
having to ignore the robots.txt completely (robots=0)?
 
Reply


All articles

Subject Author Date
Browser ID changed but still blocked by robots.txt

08/24/2013 13:10
Re: Browser ID changed but still blocked by robots.txt

08/31/2013 10:33




8

Created with FORUM 2.0.11