| > Okay problem detected and fixed. The problem is
the 'near'
> option, in a particular case:
> - a non-html file is detected (here, a txt file)
> - the 'near' hack forced the download of the file
> - this file redirects to a regular html file (here, a
fake
> 404 error page)
<snip>
> I have now setup the 'near' option so that it uses a
depth
> of 1 (download and parse, but don't download anything)
>
> > Does that include removing the tabs and spaces too?>
> No - control chars, that is, < 32, excluding the space
> character :)
>
> Okay, I'll try to release a beta-6 soon on httrack.com
(in
> few minutes) which includes the 'near' fix.
Hello,
I just "continued" the mirroring of this project, and it
completed successfully (AFAIK) except for one
endless/infinite loop on an image (actually a 404 error)
URL:
www.licensing.philips.com/partner/data/images/sm-duck.gif
This URL is parsed by Httrack and becomes:
www.licensing.philips.com/partner/data/images/images/sm-
duck.gif
www.licensing.philips.com/partner/data/images/images/images/
sm-duck.gif
etc.
The first two lines related to this URL from new.txt are:
16:58:27 19952/19952 -R-MC- 200 added ('OK')
text/html date:Mon,%2030%20Dec%202002%
2022:52:32%20GMT
www.licensing.philips.com/partner/data/images/sm-
duck.gif I:/web-archive%
20problematic/www.epanorama.net%
2020021228/www.licensing.philips.com/partner/data/images/sm-
duck.gif (from
www.licensing.philips.com/partner/data/sl00811.pdf)
17:09:36 19973/19973 -R-MC- 200 added ('OK')
text/html date:Mon,%2030%20Dec%202002%
2023:03:40%20GMT
www.licensing.philips.com/partner/data/images/images
/sm-duck.gif I:/web-archive%
20problematic/www.epanorama.net%
2020021228/www.licensing.philips.com/partner/data/images/ima
ges/sm-duck.gif (from
www.licensing.philips.com/partner/data/images/sm-duck.gif)
There were other files downloaded from this site that did
not cause an infinite loop, even though it also is
displayed as the same 404 error as the image URL:
www.licensing.philips.com/partner/data/sl00811.pdf
I was able to stop the infinite loop by hitting 'skip' once
I noticed what was happening. Is the problem with
httrack's handling of the page or with a 'broken' server
that gives false responses? It surprises me that the .pdf
link did not go infinite while the .gif did go infinite.
Anyway, if it helps, I've posted the new.txt,
winprofile.ini, and broken pdf and gif to:
<http://kazemizadeh.net/httrack/epanorama.com/>
-Haudy Kazemi | |