| > > I spent the better part of the day trying to make
> > this work but couldn't. I get the page I want,
> but
> > not the files associated with the record
> > descriptions.
>
> Does the first page mirrored show the results page?
Yes. I ran a test limit of 100 on the site description above. The page
displays correctly, but when I click on an image link (or record link) it
takes me to an ARC Time-Out Page. The same happens for the record link.
> what files aren't you getting?
The linked image files (gif). Scan Rule is set to get *gif, *jpg, etc.
> What does the log say? Did you set the log to
> debug?>
I don't know if the log is set to debug (I'm basically using defaults, except
as you recommended). After some experimenting, the log file generally returns
no errors.
I've tried a number of default mirrors, but the results are the same, i.e.,
Arc Time-Out. I captured the URL for the Hierarchy Tab, the page that displays
correctly.
Here is what I did:
National Archives arcweb.archives.gov
Select Digital Copies
Limit 100
Search for: Ansel Adams
Set-up CatchURL
Click Hierarchy Tab
URL inserted into HTTrack
Continue with HTTrack
I tried several different mirror depths, from 1 to 5 both internal/external. 2
internal and 0 external work best.
Lastly, the log suggested turning off the robot rules.
Thanks again for your help.
| |