Re: Bug? Escaping spider. - HTTrack Website Copier Forum

Subject: Re: Bug? Escaping spider.

Author: Biep

Date: 08/11/2010 14:38

Having moved again I don't have all my stuff with me, but looking in what I
have I find lines like the following in new.lst:

20:38:56	57335/57335	---M--	200	added ('OK')	text/html
date:Mon,%2002%20Aug%202010%2008:25:52%20GMT
<http://animals.howstuffworks.com/insects/fig-wasp.htm>
E:/I/Escape/animals.howstuffworks.com/insects/fig-wasp.htm	(from
<http://auto.howstuffworks.com/stirling-engine.htm>)

But the downloaded copy of <http://auto.howstuffworks.com/stirling-engine.htm>
doesn't have a link to this page, even though new.txt pretends it does.  (I
removed all /" *\+ *"/ from the page so as to check obfuscated links as
well.)

The way I discovered ocw.mit.edu was because some years ago another download
with external=1 ended up copying that whole site (which is huge) in my
absense.  That was also when I became aware of the implications of this bug.
Because of the misattribution in new.txt I haven't been able to track down the
precise point where that spider escaped; the howstuffworks case is the
smallest example I have found so far.  Sorry it isn't cleaner.

Create subthread

All articles

Subject	Author	Date
Bug? Escaping spider.		08/02/2010 20:56
Re: Bug? Escaping spider.		08/03/2010 22:53
Re: Bug? Escaping spider.		08/04/2010 18:53
Re: Bug? Escaping spider.		08/07/2010 02:58
Re: Bug? Escaping spider.		08/07/2010 18:46
Re: Bug? Escaping spider.		08/10/2010 16:21
Re: Bug? Escaping spider.		08/11/2010 14:38
Re: Bug? Escaping spider.		08/13/2010 20:19
Re: Bug! Escaping spider.		08/13/2010 22:05
Re: Bug! Escaping spider.		08/14/2010 15:42
Re: Bug! Squash it before it reproduces!		08/14/2010 20:16
Re: Bug? Escaping spider.		10/09/2010 16:54
Re: Bug! Escaping spider.		03/09/2011 18:02
Re: Bug! Escaping spider.		03/15/2011 18:10