HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: problem making a mirror of a VERY large site
Author: Xavier Roche
Date: 04/23/2006 10:22
 
> April 22 2006 has 365,600,739 photos (522,459 new in
> 24 hours).

Wow. With embedded html pages and links, it means more than a million URLs to
catch. This is getting a bit big for a small crawler like httrack ; which is
dimensionned by default for 100,000 links.

You have first to adjust the "maximum number of links" in the httrack options,
or else the mirror will die when reaching 100Klinks.

Then, take care of not clobbering the site - one million URLs IS REALLY big
for a regular server, and it may cause some bandwidth problems.

Apart from that, there are no specific options to enable, if everything's on
the site.

 
Reply Create subthread


All articles

Subject Author Date
problem making a mirror of a VERY large site

04/23/2006 02:10
Re: problem making a mirror of a VERY large site

04/23/2006 10:22
Re: problem making a mirror of a VERY large site

04/23/2006 18:40




2

Created with FORUM 2.0.11