HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: SOS: Not going beyond the starting URL
Author: Xavier Roche
Date: 12/09/2002 20:13
 
> For each run, from first to the sixth, the HTTrack ended 
> in less than two minutes, where only the starting URL 
> pages were downloaded and not one Question & Answer pages.

First, when you are crawling large sites, you *must* setup 
reasonnable settings (for forums, not more than 2 or 3 
simultaneous connections, and bandwidth limit) or the 
websites will progressively ban all offline browsers.

Okay, for your problem, I did not see any obvious errors ; 
please launch (in your browser) the top index.html of the 
(quickly) mirrored project and check what are the links 
written. There might be multiple reasons, and testing is 
not very simple with https sites. If you let the mouse on a 
link not mirrored, what do you see as URL? The problem can 
be multiple redirect pages, or even crawler protections (I 
heard that some bad users were crawling google using too 
aggressive settings, this is indeed a stupid thing to do)


 
Reply Create subthread


All articles

Subject Author Date
SOS: Not going beyond the starting URL

12/09/2002 11:49
Re: SOS: Not going beyond the starting URL

12/09/2002 20:13




0

Created with FORUM 2.0.11