Re: Attempted Mirror Always Craters - HTTrack Website Copier Forum

Subject: Re: Attempted Mirror Always Craters

Author: William Roeder

Date: 04/21/2009 18:57

> I'd like HTTrack to capture that page and all the
> webpages and documents (PDF, *.doc, etc.) linked to
> from that page.  My understanding is that if I set
> the mirroring and external depths to "3" it should
> capture all this information in an archive that can
> be browsed offline.

Since those links are external you do have to set both limits.

In addition, many may not want bots scanning those locations. Set No
Robots.txt

With just those, I got: 3.43-3 mirror complete in 9 minutes 26 seconds : 106
links scanned, 98 files written (40548319 bytes overall...

> Here's a typical error message:

> 09:51:34 Error:  "Not Found" (404) at link
> www.itsonline.com/srs/online%20at:%20http:/www.itson
> line.com/ttid/datalock_scheme.pdf (from
> www.itsonline.com/srs/sources_links_d6.html)
> 
> What's confusing to me is that the PDF file that it
> couldn't find
> (http://www.itsonline.com/ttid/datalock_scheme.pdf)
> exists on the web.

Yes but that's not what your link points to:
< A HREF="online%20at:%20http:/www.itsonline.com/ttid/datalock_scheme.pdf" >

The following links also incorrectly point to the initial page location
(www.itsonline.com/srs/)

oge_foia_response_to_pogo.pdf
mineta/%20mineta_uwb_jul10_07.doc
ttid_letters/%20doggett_email_jun_14_07.pdf
trfc_lobbyists/%20winston_strawn_2001_second_half.pdf
ttid/%20itip_local_match_analysis_v1c.doc
ttid/ttid_white_paper.pdf

Create subthread

All articles

Subject	Author	Date
Attempted Mirror Always Craters		04/21/2009 17:03
Re: Attempted Mirror Always Craters		04/21/2009 18:57