HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: filename
Author: Xavier Roche
Date: 08/10/2002 23:14
 
> I'm worried that these dynamic pages would be 
> redownloaded the next time I update the mirror even 
though 
> there's no changes in the page itself.

The way httrack saves/renames *locally* these pages does 
not change the way httrack does updates, and does not 
influence the whole update process. The original remote 
hostname, filename AND query strings are stored in the hts-
cache/ file data ; and httrack only use these information 
to perform the update process.

But in fact, the major update process is handled by the 
remote server, through two important processes:

- during the first download, the server has to send a 
reliable way to tag the file/url ; such as a timestamp 
(current date+time) or, even better, a strong etag 
identifier (which can be an md5 hash of the content ; which 
is the "ultimate weapon" for handling updates). This 
information allow to identify the "freshness" of the data 
being sent.

- during the update, httrack requests the previously 
downloaded file, giving to the server the "hint" previously 
sent (timestamp, and/or etag). It is the duty of the server 
to either respond with a "OK, file not modified" message 
(304), or using a "OOPS, you have to redownload this file" 
message (200)

With this system, the caching process is totally 
transparent, and very reliable. That's the theory. Now 
let's go back to the real world..

Some servers, unfortunately, are really dumb ; and just 
ignore the timestamp/etag ; or do not give any reliable 
information the first time. Because of that, (offline) 
browsers like httrack are forced to re-dowload twice data 
that is identically to the previous version.. clever 
servers, sometimes, are also unable to "handle cleverly" 
stupid scripts that just don't care about bandwidth waste 
and caching problems. 

Because of that, many websites (especially those 
with "dynamic" pages) are not "cache compliant", and 
browsers will always re-download their data.

But this is not something a browser can change - only 
servers could, if only webmasters were concerned about 
caching problems.

(for information, there are ALWAYS methods that allow to 
cache pages, even dynamic ones, and even those using 
cookies and other session-related data)
 
Reply Create subthread


All articles

Subject Author Date
filename

08/09/2002 15:30
Re: filename

08/09/2002 22:30
Re: filename

08/10/2002 06:02
Re: filename [REPOST]

08/10/2002 17:13
Re: filename [REPOST]

08/10/2002 20:01
Re: filename

08/10/2002 23:14




1

Created with FORUM 2.0.11