| I found a way to bypass the robots.txt from one popular website (the wordpress
codex site).
Here's what I did: In the spiders category, switch it to "accept robots except
for filters" (something like that, I´m using the German version)
Then change the Browser ID to Java (something else might work, too but I
haven´t tested it). Turn off sending a HTML footer and make sure under
connections you use a value lower than 8 because many websites even ban ips
which open up more than 8 connections. So 1-4 is a good value here. Hope this
helps. I´m really glad I could finally bypass the wordpress ban of httrack. | |