HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: counting duplicates
Author: Ryuu
Date: 06/08/2013 03:09
 
It's possible but it's not HTTrack job. You should find some manage application
instead.
The way to check it is not hard. But it'll slower downloading. Assume that
difference address has difference content normally, it's unworth effort/job.
The guide to check in duplication is
1. read all file
while working with each file
2. remember their size
3. remember their hash
4. comparing ALL with old one (We uses size/hash to reduce comparing time.)
5. If it's all same, check all content or just assume that it's the same
file.

As you can see, even though it's more better than plainly compare each file
content (Assume you download 32 file you must compare around 1+2+3+...+31
times), byte by byte of data.
Above method also still consume time. And while downloading what HTT should
do?Delay next downloading? - Very very fuking slow for sure
Separate job? So remember queue for finish downloading but wait for comparing.
- Memory consuming. Site mirroring is already memory consuming.

I also want the same thing with you but when I think about it. Nah, it's
better without it. lol
 
Reply Create subthread


All articles

Subject Author Date
counting duplicates

06/06/2013 17:44
Re: counting duplicates

06/06/2013 18:32
Re: counting duplicates

06/06/2013 20:06
Re: counting duplicates

06/06/2013 20:07
Re: counting duplicates

06/08/2013 03:09
Re: counting duplicates

06/08/2013 12:27




a

Created with FORUM 2.0.11