| I'm trying to mirror a website in order to monitor it's
contents to comply with a court order. Recently, my
attempt was apparently blocked by the webmaster. The only
change I can see is some code that says:
00no download.lbi, which exists at the top of the library
and all pages appear to have code in them that redirect me
to this directory. The log provides:
HTTrack3.30+swf launched on Thu, 06 Nov 2003 18:40:54 at
<http://www.xxx.com/~zzz/usr/bob/> +*.png +*.gif +*.jpg
+*.css +*.js -ad.doubleclick.net/*
(winhttrack -qwC2%Ps2u1%sN0%I0p3DaK0H0%kf2A25000%f#f -
F "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%
F "<!-- Mirrored from %s%s by HTTrack Website Copier/3.x
[XR&CO'2003], %s -->" -%l "en, *"
<http://www.xxx.com/~zzz/usr/bob/> -O
D:\\Site_Mirrors\xx_20031105,D:\CD\Site_Mirrors\xxx_2003110
5 +*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -
%A php3,php,php2,asp,jsp,pl,cfm,nsf=text/html )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may
contain sensitive information,
such as username/password authentication for
websites mirrored in this project
do not share these files/folders if you want these
information to remain private
18:40:54 Info: Note: due to www.eskimo.com remote
robots.txt rules, links begining with these path will be
forbidden: /cgi-bin/, /*/cgi-bin/ (see in the options to
disable this)
18:40:59 Warning: Warning, link #14 empty
HTTrack mirror complete in 5 seconds : 13 links scanned,
12 files written (34034 bytes overall) [38449 bytes
received at 7689 bytes/sec], 3.2 requests per connection
(No errors, 1 warnings, 1 messages)
Names have been changed to ensure the website/webmaster's
privacy, but all else is the same. I did try to force the
robots to ignore, but it was irrelevant. I can only
download the library. How can I grab the entire site to
ensure continued compliance?
Thanks,
StateofMind
| |