HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: update hack and file size comparing
Author: Xavier Roche
Date: 07/26/2002 20:52
> If I understand correctly, the 'update hack' update does 
> not simply accept the 'Modified' status as the ultimate 
> truth, but it compares the sizes of the previously saved 
> request and the current request.
> It seems to me that the size comparison is done on the 
> sizes as reported by the webserver, not the number of 
> actually downloaded.

Update checks are NEVER done using the 'number of bytes 
actually downloaded', because basically the purpose is NOT 
to download unnecessary data! The only size compared is the 
size sent by the remote server in the headers, IF available.

> The actual file size is 28431 bytes, the file size as 
> responded by the web server is -1 (unknown).

Yes, therefore the engine won't be able to know if the file 
was updated.. or not

> Thus: a retransfer and processing again of all links 


> Effectively 90% of the site has to be downloaded and 
> processed again. (Everything except static files)

Yep, nasty server! Use of Etag would solve all problems, 
and would be easy to implement. But unfortunately, rare are 
the servers who can do that (stupid servers!)

> I know that would mean you would have to download and 
> examine the new request anyway, but if the conclusion is 
> that the actually downloaded content is equal to the 
> previously downloaded content, it would eliminate the 
> to parse the document again.

Err, parsing the links is FAST, but downloading them is not.
If a page was NOT modified, it does ***NOT*** imply that 
links inside were not modified!! It would be too easy: 
checking the first top html page, and if not modified, 
assume the whole site is updated?
> Would this mean a performance improvement?
If would not work, unfortunately :(

Reply Create subthread

All articles

Subject Author Date
update hack and file size comparing

07/26/2002 13:35
Re: update hack and file size comparing

07/26/2002 20:52


Created with FORUM 2.0.11