HTTrack Website Copier
Free software offline browser - FORUM
Subject: Need a "No moved pages" flag
Author: Pulsar
Date: 11/11/2005 10:55
 
Hi,

I tried to carry on with that problem by myself and "sniffed" a full scan on
the same WebSite with ethereal. It is clear that the following occurs :

1° I requested the scan of 2 links on the WebSite, one existing, the other
not. I used "test links" and "extended log".

2° I received 3 answers :
   302 Found for robots.txt
   302 Found for the non-existing link
   200 OK for the existing one

3° This is correctly traced in the hts-log.txt file

All works correctly... But :

302 Found is considered as a Warning, not an Error. If I scan a WebSite and
ask to mirror existing pages, I get as many relocation pages as there is
non-existing links. The "No error pages" as no effect, unfortunately.
Moreover, if I set a "Maximum mirror depth" of 0, in order to download only
existing links, I get 2 files for each relocated links : the relocated page
and a read-me page.

Consequently, when looking for unknown links on a WebSite, say imagexxxx.jpg,
I get about 20000 unwanted files, wasting time and bandwidth for nothing.

I know that this isn't a conventional way of using HTTrack, but this is what I
need and that can be done by some commercial products without penalty.
However, I have at least two ways to deal with that problem :

a) Do it in two phases : first, only "test links" then manualy "mirror links"
which are identified as "link recorded" in hts-log.txt

b) Create a program launching HTTrack link by link to do automaticaly the job
described in a). This is not so fun because I have to re-write all the job
which is already done by the URL list (.txt) management, and because I have no
information about the returned values of HTTrack when launched by an "exec"
command. If there's no usefull returned value ( such as 302 or 200 ), I'll
have to parse the hts-log.txt file before launching the next command,
resulting in a loss of performance.

It's not surprising if I say that I would prefer a "No moved pages" flag in
"Build" option page, as existing "No error pages" and "No external pages"
flags doesn't solve my problem. Indeed, this is only a proposal   :o)

Best regards.
Pulsar.
 
Reply Create subthread


All articles

Subject Author Date
No error pages

11/06/2005 17:48
Need a "No moved pages" flag

11/11/2005 10:55




3

Created with FORUM 2.0.11