Re: update hack and file size comparing - HTTrack Website Copier Forum

Subject: Re: update hack and file size comparing

Author: Xavier Roche

Date: 07/26/2002 20:52

> If I understand correctly, the 'update hack' update does 
> not simply accept the 'Modified' status as the ultimate 
> truth, but it compares the sizes of the previously saved 
> request and the current request.
> It seems to me that the size comparison is done on the 
> sizes as reported by the webserver, not the number of 
bytes 
> actually downloaded.

Update checks are NEVER done using the 'number of bytes 
actually downloaded', because basically the purpose is NOT 
to download unnecessary data! The only size compared is the 
size sent by the remote server in the headers, IF available.

> The actual file size is 28431 bytes, the file size as 
> responded by the web server is -1 (unknown).

Yes, therefore the engine won't be able to know if the file 
was updated.. or not

> Thus: a retransfer and processing again of all links 
inside.

Exactly

> Effectively 90% of the site has to be downloaded and 
> processed again. (Everything except static files)

Yep, nasty server! Use of Etag would solve all problems, 
and would be easy to implement. But unfortunately, rare are 
the servers who can do that (stupid servers!)

> I know that would mean you would have to download and 
> examine the new request anyway, but if the conclusion is 
> that the actually downloaded content is equal to the 
> previously downloaded content, it would eliminate the 
need 
> to parse the document again.

Err, parsing the links is FAST, but downloading them is not.
If a page was NOT modified, it does ***NOT*** imply that 
links inside were not modified!! It would be too easy: 
checking the first top html page, and if not modified, 
assume the whole site is updated?
> Would this mean a performance improvement?
If would not work, unfortunately :(

Create subthread

All articles

Subject	Author	Date
update hack and file size comparing		07/26/2002 13:35
Re: update hack and file size comparing		07/26/2002 20:52