Re: counting duplicates - HTTrack Website Copier Forum

Subject: Re: counting duplicates

Author: Xavier Roche

Date: 06/08/2013 12:27

> I would like to count the duplicate files from a
> website to make some operations on them.
> Is it possible ?
Unfortunately finding duplicates would not help to spare bandwidth. There is
not way to know whether or not "index.html" and "index.html?foo=42" will
produce the same content, or different content, until you download the two
files.

httrack might hash the content, and realize that the content was already
downloaded, but at this time it is too late: the referring page has already
been produced and saved on disk, and we are producing a new file.

Create subthread

All articles

Subject	Author	Date
counting duplicates		06/06/2013 17:44
Re: counting duplicates		06/06/2013 18:32
Re: counting duplicates		06/06/2013 20:06
Re: counting duplicates		06/06/2013 20:07
Re: counting duplicates		06/08/2013 03:09
Re: counting duplicates		06/08/2013 12:27