| I'm trying to use WinHTTrack to copy WOTC's 3.x archives from the wayback
machine. It's the first time I've used the program and I'm running into some
problems. Hopefully, you guys have the answers.
I start with <http://archive.wizards.com/default.asp?x=dnd/arch/dnd> as the
starting web address.
Under Settings->Scan Rules, I have
-ad.doubleclick.net/*
+archive.wizards.com/dnd/files/*.zip
+archive.wizards.com/dnd/files/*.pdf
Under Settings->Spider, I set
"no robots.txt rules"
Under Settings->Expert Only, I set
"Rewrite links: internal / external" to "Relative URI / Absolute URL
(default)"
My problem seems to be I'm not picking up all the files. For example, I get
on the page
<http://archive.wizards.com/default.asp?x=dnd/arch/cwc>
I get the page correctly (saved to
<file:///C:/My%20Web%20Sites/WOTC%20Archives%20-%20Old/archive.wizards.com/default689f.html?x=dnd/arch/cwc>)
But the link to the zip file for "Stealthy Rascals 4" (link is on the page
under the small p) should link to a file on my computer, but instead links to
the file on the wayback machine. It does actually download the zip file, but
doesn't change the link to point to it. And for all I know, it could be
downloading the pdf because of something on another page.
What do I need to set to make my local copy point to the file on my machine?
Thanks.
While I'm asking, is there a way to see what's in robots.txt? | |