| Hi,
I tried to carry on with that problem by myself and "sniffed" a full scan on
the same WebSite with ethereal. It is clear that the following occurs :
1° I requested the scan of 2 links on the WebSite, one existing, the other
not. I used "test links" and "extended log".
2° I received 3 answers :
302 Found for robots.txt
302 Found for the non-existing link
200 OK for the existing one
3° This is correctly traced in the hts-log.txt file
All works correctly... But :
302 Found is considered as a Warning, not an Error. If I scan a WebSite and
ask to mirror existing pages, I get as many relocation pages as there is
non-existing links. The "No error pages" as no effect, unfortunately.
Moreover, if I set a "Maximum mirror depth" of 0, in order to download only
existing links, I get 2 files for each relocated links : the relocated page
and a read-me page.
Consequently, when looking for unknown links on a WebSite, say imagexxxx.jpg,
I get about 20000 unwanted files, wasting time and bandwidth for nothing.
I know that this isn't a conventional way of using HTTrack, but this is what I
need and that can be done by some commercial products without penalty.
However, I have at least two ways to deal with that problem :
a) Do it in two phases : first, only "test links" then manualy "mirror links"
which are identified as "link recorded" in hts-log.txt
b) Create a program launching HTTrack link by link to do automaticaly the job
described in a). This is not so fun because I have to re-write all the job
which is already done by the URL list (.txt) management, and because I have no
information about the returned values of HTTrack when launched by an "exec"
command. If there's no usefull returned value ( such as 302 or 200 ), I'll
have to parse the hts-log.txt file before launching the next command,
resulting in a loss of performance.
It's not surprising if I say that I would prefer a "No moved pages" flag in
"Build" option page, as existing "No error pages" and "No external pages"
flags doesn't solve my problem. Indeed, this is only a proposal :o)
Best regards.
Pulsar. | |