HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: counting duplicates
Author: Xavier Roche
Date: 06/08/2013 12:27
 
> I would like to count the duplicate files from a
> website to make some operations on them.
> Is it possible ?
Unfortunately finding duplicates would not help to spare bandwidth. There is
not way to know whether or not "index.html" and "index.html?foo=42" will
produce the same content, or different content, until you download the two
files.

httrack might hash the content, and realize that the content was already
downloaded, but at this time it is too late: the referring page has already
been produced and saved on disk, and we are producing a new file.

 
Reply Create subthread


All articles

Subject Author Date
counting duplicates

06/06/2013 17:44
Re: counting duplicates

06/06/2013 18:32
Re: counting duplicates

06/06/2013 20:06
Re: counting duplicates

06/06/2013 20:07
Re: counting duplicates

06/08/2013 03:09
Re: counting duplicates

06/08/2013 12:27




e

Created with FORUM 2.0.11