HTTrack Website Copier
Free software offline browser - FORUM
Subject: How to tell which pages are updated?
Author: Lars Clausen
Date: 04/01/2004 09:57
 
I have a nightly updating run, and would like to see how
many pages are updated, new, cached, or disappeared.  The
log at the end consistently says that all pages are new,
which is highly unlikely:

HTTrack Website Copier/3.31 mirror complete in 2 hours 29
minutes 57 seconds : 14861 links scanned, 14735 files
written (224374629 bytes overall), no files updated
[99896462 bytes received at 11103 bytes/sec], 1.0 requests
per connection

I'm running with options -B -c10 -i -C2 -n -z -a -A100000
-#L10000000.  I know that some files there (www.jp.dk)
update every single day, so I don't trust the above numbers.
 Also, the pages have good ETags and Last-Modified-Dates. 
I've tried making sense of the log, but I can't tell when
it's getting something from the cache and when it's
downloading a page anew.  Can someone give a short 'guide to
HTTrack -z' (or maybe to the hts-cache files), or see why it
seems to download everything anew every day?
Thanks,
-Lars

 
Reply


All articles

Subject Author Date
How to tell which pages are updated?

04/01/2004 09:57
Re: How to tell which pages are updated?

04/01/2004 13:07
Re: How to tell which pages are updated?

07/16/2004 03:55




7

Created with FORUM 2.0.11