| > I would like to count the duplicate files from a
> website to make some operations on them.
> Is it possible ?
Unfortunately finding duplicates would not help to spare bandwidth. There is
not way to know whether or not "index.html" and "index.html?foo=42" will
produce the same content, or different content, until you download the two
files.
httrack might hash the content, and realize that the content was already
downloaded, but at this time it is too late: the referring page has already
been produced and saved on disk, and we are producing a new file.
| |