HTTrack Website Copier
Free software offline browser - FORUM
Subject: Link to file not changed to point to copied file
Author: kitep
Date: 02/11/2016 16:57
 
I'm trying to use WinHTTrack to copy WOTC's 3.x archives from the wayback
machine.  It's the first time I've used the program and I'm running into some
problems.  Hopefully, you guys have the answers.

I start with <http://archive.wizards.com/default.asp?x=dnd/arch/dnd> as the
starting web address.

Under Settings->Scan Rules, I have
-ad.doubleclick.net/*
+archive.wizards.com/dnd/files/*.zip
+archive.wizards.com/dnd/files/*.pdf

Under Settings->Spider, I set
"no robots.txt rules"

Under Settings->Expert Only, I set
"Rewrite links: internal / external" to "Relative URI / Absolute URL
(default)" 

My problem seems to be I'm not picking up all the files.  For example, I get
on the page
<http://archive.wizards.com/default.asp?x=dnd/arch/cwc>
I get the page correctly (saved to
<file:///C:/My%20Web%20Sites/WOTC%20Archives%20-%20Old/archive.wizards.com/default689f.html?x=dnd/arch/cwc>)
But the link to the zip file for "Stealthy Rascals 4" (link is on the page
under the small p) should link to a file on my computer, but instead links to
the file on the wayback machine.  It does actually download the zip file, but
doesn't change the link to point to it.  And for all I know, it could be
downloading the pdf because of something on another page.

What do I need to set to make my local copy point to the file on my machine? 
Thanks.

While I'm asking, is there a way to see what's in robots.txt?
 
Reply


All articles

Subject Author Date
Link to file not changed to point to copied file

02/11/2016 16:57
Re: Link to file not changed to point to copied file

02/18/2016 11:18




f

Created with FORUM 2.0.11