HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: avoid scanning multiple copies of same file
Author: Xavier Roche
Date: 11/30/2013 15:06
 
> I have been trying to download a 'wiki' as well as
> several forum websites. In all cases the download
> seems endless, with multiple copies of the same
> file(s) being created.

Unfortunately, httrack can not "guess" that the links are actually the same.
You can not "collate" links either with httrack - but you may exclude download
of the additional links using scan rules.

> It seems that the "index.php?title=X" part leads
> Httrack to create separate html files. Is there any
> way by using either filters, options or both, to
> force Httracks to only do one copy of each file it
> find, rather than multiples? Thanks in advance.

It seems that you are crawling diffs, history, etc. - which are different
content.

You may however exclude them, for example using the following scan rules
(Options / Scan Rules):

-*action=* -*diff=*




 
Reply Create subthread


All articles

Subject Author Date
avoid scanning multiple copies of same file

11/30/2013 04:24
Re: avoid scanning multiple copies of same file

11/30/2013 15:06
Re: avoid scanning multiple copies of same file

11/30/2013 16:42
Re: avoid scanning multiple copies of same file

12/06/2013 10:45




6

Created with FORUM 2.0.11