HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: How to completely ignore 'robots.txt'?
Author: Daniel
Date: 01/11/2015 22:18
 
I found a way to bypass the robots.txt from one popular website (the wordpress
codex site).
Here's what I did: In the spiders category, switch it to "accept robots except
for filters" (something like that, I´m using the German version)
Then change the Browser ID to Java (something else might work, too but I
haven´t tested it). Turn off sending a HTML footer and make sure under
connections you use a value lower than 8 because many websites even ban ips
which open up more than 8 connections. So 1-4 is a good value here. Hope this
helps. I´m really glad I could finally bypass the wordpress ban of httrack.
 
Reply Create subthread


All articles

Subject Author Date
How to completely ignore 'robots.txt'?

11/30/2001 11:41
Re: How to completely ignore 'robots.txt'?

11/30/2001 17:43
Re: How to completely ignore 'robots.txt'?

10/03/2010 02:01
Re: How to completely ignore 'robots.txt'?

02/19/2013 19:52
Re: How to completely ignore 'robots.txt'?

01/11/2015 22:18
Re: How to completely ignore 'robots.txt'?

03/18/2015 15:10
Re: How to completely ignore 'robots.txt'?

05/23/2015 22:13
Re: How to completely ignore 'robots.txt'?

05/25/2017 03:56
Re: How to completely ignore 'robots.txt'?

01/16/2020 19:36
Re: How to completely ignore 'robots.txt'?

04/21/2023 23:47




f

Created with FORUM 2.0.11