HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Bug? Escaping spider.
Author: Biep
Date: 08/11/2010 14:38
 
Having moved again I don't have all my stuff with me, but looking in what I
have I find lines like the following in new.lst:

20:38:56	57335/57335	---M--	200	added ('OK')	text/html
date:Mon,%2002%20Aug%202010%2008:25:52%20GMT
<http://animals.howstuffworks.com/insects/fig-wasp.htm>
E:/I/Escape/animals.howstuffworks.com/insects/fig-wasp.htm	(from
<http://auto.howstuffworks.com/stirling-engine.htm>)

But the downloaded copy of <http://auto.howstuffworks.com/stirling-engine.htm>
doesn't have a link to this page, even though new.txt pretends it does.  (I
removed all /" *\+ *"/ from the page so as to check obfuscated links as
well.)

The way I discovered ocw.mit.edu was because some years ago another download
with external=1 ended up copying that whole site (which is huge) in my
absense.  That was also when I became aware of the implications of this bug.
Because of the misattribution in new.txt I haven't been able to track down the
precise point where that spider escaped; the howstuffworks case is the
smallest example I have found so far.  Sorry it isn't cleaner.
 
Reply Create subthread


All articles

Subject Author Date
Bug? Escaping spider.

08/02/2010 20:56
Re: Bug? Escaping spider.

08/03/2010 22:53
Re: Bug? Escaping spider.

08/04/2010 18:53
Re: Bug? Escaping spider.

08/07/2010 02:58
Re: Bug? Escaping spider.

08/07/2010 18:46
Re: Bug? Escaping spider.

08/10/2010 16:21
Re: Bug? Escaping spider.

08/11/2010 14:38
Re: Bug? Escaping spider.

08/13/2010 20:19
Re: Bug! Escaping spider.

08/13/2010 22:05
Re: Bug! Escaping spider.

08/14/2010 15:42
Re: Bug! Squash it before it reproduces!

08/14/2010 20:16
Re: Bug? Escaping spider.

10/09/2010 16:54
Re: Bug! Escaping spider.

03/09/2011 18:02
Re: Bug! Escaping spider.

03/15/2011 18:10




5

Created with FORUM 2.0.11