HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Search Intensive Sites
Author: Jim Rems
Date: 08/27/2008 08:32
 
> > > Does the first page mirrored show the results
> > page?> > 
> > Yes.  I ran a test limit of 100 on the site
> > description above.  The page displays correctly,
> but
> > when I click on an image link (or record link) it
> > takes me to an ARC Time-Out Page.  The same
> happens
> > for the record link.
> Those were not-mirrored pages.  You can prevent
> those with options -> Build -> No external pages

Okay.  Did it.

> 
> > > what files aren't you getting?> > The linked image files (gif).  Scan
Rule is set
> to
> > get *gif, *jpg, etc.
> Do you mean +*gif +*jpg 

Yes, sorry. +*gif, etc.

> 
> > > What does the log say? Did you set the log to
> > > debug?> > I don't know if the log is set to debug (I'm
> options -> log -> create log files -> Select box

Yes. Set by default.

> 
> Some of the links look like:
> src="/arc/laf/nara/images/select/iconLAButt_2.gif"
> The default is only scan downword from the starting
> url.
> Try options -> experts -> Travel mode= up/down
> 

Okay. Did it.

I tried several tests (errors = 0).  The initial page displays correctly, (the
captured URL page) but the links on that page consistantly send me to a
"Search Not Available" page.

The links are associated with (a) a thumbnail gif (or image icon); and (b) a
minor text record description. Clicking the gif or image icon will get a large
version of the image on the NARA site.  Clicking on the short text record
description will get a full text record on the NARA site.

These two items, when clicked locally, send me to a "Search Not Available"
page.

HTTrack is mentioned on the Archives site at:

<http://www.archives.gov/records-mgmt/bulletins/2005/2005-02b.html>

In the first paragraph of that page, there is a link to general and technical
specifications for harvesting (see Appendix B and C).  I hope this helps.

Thanks for your continued help.  I'm not so sure we're going to be able to
make this work, but I'm willing to keep trying if you are.

 
Reply Create subthread


All articles

Subject Author Date
Search Intensive Sites

08/24/2008 08:16
Re: Search Intensive Sites

08/24/2008 16:06
Re: Search Intensive Sites

08/25/2008 06:48
Re: Search Intensive Sites

08/25/2008 17:30
Re: Search Intensive Sites

08/26/2008 07:17
Re: Search Intensive Sites

08/26/2008 20:10
Re: Search Intensive Sites

08/27/2008 08:32




3

Created with FORUM 2.0.11