HTTrack Website Copier
Free software offline browser - FORUM
Subject: analysis of new.txt files....
Author: Peter Yohe
Date: 04/16/2005 00:54
 
Hi Xavier,

We have been looking at the files in the hts-cache of
HTTrack projects and we think we might be able to load the
information from some of these into a database. Then we
could use this information to merge different projects that
hold host directories with the same URI name. We are looking
at the new.txt as the file we would use for this.

Off hand do you see a reason why this would not work?
One of our goals is to be able to "normalize" our projects
into one directory. Another goal is to only copy files into
that normalized directory that have really changed or are
new and delete ones that no longer exist. This is difficult
due to the large number of data base driven web sites we
copy. We think we can use the information in new.txt to
accomplish this. 

Here are my questions: do you have any documentation about
the different fields in the new.txt file? In particular, the
Status('servermsg') and the flags. Some of them we've been
able to figure out. Status -> added ('servermsg') -> ('ok'),
but not error('Object%20moved') other than it refers to a
302 event.

If we use the size comparison, the flags and the
status('servermsg')do you think we could come up with a set
of rules to determine which files to normalize, to ignore
and to delete?
I look forward to hearing from you and any other HTTrack
user who might be able to help.

Thanks!

Peter
www.widernet.org
www.egranary.org
 
Reply


All articles

Subject Author Date
analysis of new.txt files....

04/16/2005 00:54
Re: analysis of new.txt files....

04/23/2005 11:31
Re: analysis of new.txt files....

04/25/2005 19:11




d

Created with FORUM 2.0.11