Re: SOS: Not going beyond the starting URL - HTTrack Website Copier Forum

Subject: Re: SOS: Not going beyond the starting URL

Author: Xavier Roche

Date: 12/09/2002 20:13

> For each run, from first to the sixth, the HTTrack ended 
> in less than two minutes, where only the starting URL 
> pages were downloaded and not one Question & Answer pages.

First, when you are crawling large sites, you *must* setup 
reasonnable settings (for forums, not more than 2 or 3 
simultaneous connections, and bandwidth limit) or the 
websites will progressively ban all offline browsers.

Okay, for your problem, I did not see any obvious errors ; 
please launch (in your browser) the top index.html of the 
(quickly) mirrored project and check what are the links 
written. There might be multiple reasons, and testing is 
not very simple with https sites. If you let the mouse on a 
link not mirrored, what do you see as URL? The problem can 
be multiple redirect pages, or even crawler protections (I 
heard that some bad users were crawling google using too 
aggressive settings, this is indeed a stupid thing to do)

Create subthread

All articles

Subject	Author	Date
SOS: Not going beyond the starting URL		12/09/2002 11:49
Re: SOS: Not going beyond the starting URL		12/09/2002 20:13