HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Bug? Escaping spider.
Author: William Roeder
Date: 08/13/2010 20:19
 
1) First problem is starting at
<http://auto.howstuffworks.com/stirling-engine.htm>. That is actually a
directory (there's a stirling-engine.htm/printable.html) This confuses HTT on
update and it purges the entire mirror on update. Mirror stirling-engine1.htm
instead.

2) Second problem is windows can't handle such large file paths:
hts-log.txt:4087345:18:29:37    Info:   engine: warning: serialize error for
shopproducts.howstuffworks.com/Petfindercom--Petfindercom-The-Adopted-Dog-Bible-Your-Onestop-Resource-for-Choosing-Training-and-Caring-for-Your-Sheltered-or-Rescued-Dog/productId=2046007551
to C:/Documents and Settings/Bill/My
Documents/_internetDLed/test/shopproducts.howstuffworks.com/Petfindercom--Petfindercom-The-Adopted-Dog-Bible-Your-Onestop-Resource-for-Choosing-Training-and-Caring-for-Your-Sheltered-or-Rescued-Dog/productId=2046007551.html.tmp:
open error: No such file or directory (directory exists, file does not exist)
Changed the site structure to xx in site/xx

> :52%20GMT	<http://animals.howstuffworks.com/insects/f>
> ig-wasp.htm	E:/I/Escape/animals.howstuffworks.com/in
> sects/fig-wasp.htm	(from
> <http://auto.howstuffworks.com/stirling-engine.htm>)
> 
> But the downloaded copy of
> <http://auto.howstuffworks.com/stirling-engine.htm>
> doesn't have a link to this page, even though

Sure it does, upper right hand corner is a link titled "Random" That link can
redirect anywhere. On my mirror I don't have fig-wasp but I do have
insets/ants-bees-wasps.htm
Try adding a filter -*/random-article

3) With the changes above I was getting over 5GB before I canceled it. Picking
a random mirrored page I found:
13:45:14  66374/66374     ---M--  200     added ('OK')    text/htm
l       date:Fri,%2013%20Aug%202010%2017:45:25%20GMT   
<http://animal.discovery.com/guides/wild-birds/wild-birds.html>  
C:/Documents%20and%20Settings/Bill/My%20
Documents/_internetDLed/test/animal.discovery.com/html/wild-birds.html  (from
<http://animal.discovery.com/>)
Obviously %e1 is broken.
 
Reply Create subthread


All articles

Subject Author Date
Bug? Escaping spider. 08/02/2010 20:56
Re: Bug? Escaping spider. 08/03/2010 22:53
Re: Bug? Escaping spider. 08/04/2010 18:53
Re: Bug? Escaping spider. 08/07/2010 02:58
Re: Bug? Escaping spider. 08/07/2010 18:46
Re: Bug? Escaping spider. 08/10/2010 16:21
Re: Bug? Escaping spider. 08/11/2010 14:38
Re: Bug? Escaping spider. 08/13/2010 20:19
Re: Bug! Escaping spider. 08/13/2010 22:05
Re: Bug! Escaping spider. 08/14/2010 15:42
Re: Bug! Squash it before it reproduces! 08/14/2010 20:16
Re: Bug? Escaping spider. 10/09/2010 16:54
Re: Bug! Escaping spider. 03/09/2011 18:02
Re: Bug! Escaping spider. 03/15/2011 18:10




8

Created with FORUM 2.0.11