| Hi,
I am using the following settings for my crawler:
user-agent="AnyNewNonExistingNeverUsedBrowserID"
robots=2 # follow robots.txt rules
I am using the crawler on my site, which has the following robots.txt rule:
User-agent: HTTrack
Disallow: /
but nothing gets crawled. When I remove the HTTrack rule from the robots.txt
the site gets crawled.
In both cases I can see in my website log that the user-agent
"AnyNewNonExistingNeverUsedBrowserID" has visited my site.
Question: Although the browser ID was changed, the crawler was identified by
means of robots.txt to be HTTrack and not
"AnyNewNonExistingNeverUsedBrowserID". Is there any way prevent this without
having to ignore the robots.txt completely (robots=0)? | |