| 1) First problem is starting at
<http://auto.howstuffworks.com/stirling-engine.htm>. That is actually a
directory (there's a stirling-engine.htm/printable.html) This confuses HTT on
update and it purges the entire mirror on update. Mirror stirling-engine1.htm
instead.
2) Second problem is windows can't handle such large file paths:
hts-log.txt:4087345:18:29:37 Info: engine: warning: serialize error for
shopproducts.howstuffworks.com/Petfindercom--Petfindercom-The-Adopted-Dog-Bible-Your-Onestop-Resource-for-Choosing-Training-and-Caring-for-Your-Sheltered-or-Rescued-Dog/productId=2046007551
to C:/Documents and Settings/Bill/My
Documents/_internetDLed/test/shopproducts.howstuffworks.com/Petfindercom--Petfindercom-The-Adopted-Dog-Bible-Your-Onestop-Resource-for-Choosing-Training-and-Caring-for-Your-Sheltered-or-Rescued-Dog/productId=2046007551.html.tmp:
open error: No such file or directory (directory exists, file does not exist)
Changed the site structure to xx in site/xx
> :52%20GMT <http://animals.howstuffworks.com/insects/f>
> ig-wasp.htm E:/I/Escape/animals.howstuffworks.com/in
> sects/fig-wasp.htm (from
> <http://auto.howstuffworks.com/stirling-engine.htm>)
>
> But the downloaded copy of
> <http://auto.howstuffworks.com/stirling-engine.htm>
> doesn't have a link to this page, even though
Sure it does, upper right hand corner is a link titled "Random" That link can
redirect anywhere. On my mirror I don't have fig-wasp but I do have
insets/ants-bees-wasps.htm
Try adding a filter -*/random-article
3) With the changes above I was getting over 5GB before I canceled it. Picking
a random mirrored page I found:
13:45:14 66374/66374 ---M-- 200 added ('OK') text/htm
l date:Fri,%2013%20Aug%202010%2017:45:25%20GMT
<http://animal.discovery.com/guides/wild-birds/wild-birds.html>
C:/Documents%20and%20Settings/Bill/My%20
Documents/_internetDLed/test/animal.discovery.com/html/wild-birds.html (from
<http://animal.discovery.com/>)
Obviously %e1 is broken. | |