| > I'd like HTTrack to capture that page and all the
> webpages and documents (PDF, *.doc, etc.) linked to
> from that page. My understanding is that if I set
> the mirroring and external depths to "3" it should
> capture all this information in an archive that can
> be browsed offline.
Since those links are external you do have to set both limits.
In addition, many may not want bots scanning those locations. Set No
Robots.txt
With just those, I got: 3.43-3 mirror complete in 9 minutes 26 seconds : 106
links scanned, 98 files written (40548319 bytes overall...
> Here's a typical error message:
> 09:51:34 Error: "Not Found" (404) at link
> www.itsonline.com/srs/online%20at:%20http:/www.itson
> line.com/ttid/datalock_scheme.pdf (from
> www.itsonline.com/srs/sources_links_d6.html)
>
> What's confusing to me is that the PDF file that it
> couldn't find
> (http://www.itsonline.com/ttid/datalock_scheme.pdf)
> exists on the web.
Yes but that's not what your link points to:
< A HREF="online%20at:%20http:/www.itsonline.com/ttid/datalock_scheme.pdf" >
The following links also incorrectly point to the initial page location
(www.itsonline.com/srs/)
oge_foia_response_to_pogo.pdf
mineta/%20mineta_uwb_jul10_07.doc
ttid_letters/%20doggett_email_jun_14_07.pdf
trfc_lobbyists/%20winston_strawn_2001_second_half.pdf
ttid/%20itip_local_match_analysis_v1c.doc
ttid/ttid_white_paper.pdf
| |