HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Web Archive problems once again.
Author: Ze Zoo
Date: 11/30/2014 23:01
 
> Thanks for reply.
> 
> Robots were disabled because website is dreadged
> Wayback Machine. I know that some people found out
> how to download archieved websites, i didnt.
> 
> By watching link structure inside web archive i have
> found out that Wayback Machine is using rather huge
> range of links, the most recent just beeing ones
> "ontop" with main index and few bigger subsites.
> 
> Still i cant get anything from Wayback Machine, this
> is new error log:
> 
> HTTrack3.43-9+htsswf+htsjava launched on Tue, 01 Jan
> 2002 04:53:33 at
> <http://web.archive.org/web/20080513204319/http://www>
> .naszawitryna.pl/ +*.png +*.gif +*.jpg +*.css +*.js
> -ad.doubleclick.net/* -mime:application/foobar
> (winhttrack
> -qwC2%Ps0u1%s%uN0%I0p3DaK0T1200R9H0%kf2A25000%f#f -F
> "Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)"
> -%F "<!-- Mirrored from %s%s by HTTrack Website
> Copier/3.x [XR&CO'2010], %s -->" -%l "en, en, *"
> <http://web.archive.org/web/20080513204319/http://www>
> .naszawitryna.pl/ -O1 C:\WebSites\naszawitryna.pl
> +*.png +*.gif +*.jpg +*.css +*.js
> -ad.doubleclick.net/* -mime:application/foobar )
> Information, Warnings and Errors reported for this
> mirror:
> note: the hts-log.txt file, and hts-cache folder,
> may contain sensitive information,
>  such as username/password authentication for
> websites mirrored in this project
>  do not share these files/folders if you want these
> information to remain private
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/bg2.gif
> (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/nasza_w
> itryna.gif (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/bg.gif
> (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/english
> .gif (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/german.
> gif (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/dont_me
> ss_with_poland.gif (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/nowe/ne
> w_article.gif (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/nowosci
> _NW_lt.gif (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/arrow.g
> if (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/new.gif
> (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/tygodni
> k_glos.jpg (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/nasz_dz
> iennik.jpg (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/niedzie
> la.jpg (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/mysl_po
> lska.jpg (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/nasza_p
> olska.jpg (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/orzel_a
> ntyk_s2.gif (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> 04:54:00 Error:  "Not Found" (404) at link
> www.naszawitryna.pl.wstub.archive.org/images/radio_m
> aryja.jpg (from
> web.archive.org/web/20080513204319/http://www.naszaw
> itryna.pl/)
> HTTrack Website Copier/3.43-9 mirror complete in 27
> seconds : 20 links scanned, 3 files written (19510
> bytes overall) [20843 bytes received at 771
> bytes/sec], 20750 bytes transfered using HTTP
> compression in 6 files, ratio 36%, 1.2 requests per
> connection
> (17 errors, 0 warnings, 0 messages)
> 
> I cant really get anything out of Wayback Machine,
> any help?
Thanks for reply. Robots were disabled because website is dreadged Wayback
Machine. I know that some people found out how to download archieved websites,
i didnt. By watching link structure inside web archive i have found out that
Wayback Machine is using rather huge range of links, the most recent just
beeing ones "ontop" with main index and few bigger subsites. Still i cant get
anything from Wayback Machine, this is new error log:
HTTrack3.43-9+htsswf+htsjava launched on Tue, 01 Jan 2002 04:53:33 at
<http://web.archive.org/web/20080513204319/http://www.naszawitryna.pl/> +*.png
+*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -mime:application/foobar
(winhttrack -qwC2%Ps0u1%s%uN0%I0p3DaK0T1200R9H0%kf2A25000%f#f -F "Mozilla/4.5
(compatible; HTTrack 3.0x; Windows 98)" -%F "<!-- Mirrored from %s%s by
HTTrack Website Copier/3.x [XR&CO'2010], %s -->" -%l "en, en, *"
<http://web.archive.org/web/20080513204319/http://www.naszawitryna.pl/> -O1
C:\WebSites\naszawitryna.pl +*.png +*.gif +*.jpg +*.css +*.js
-ad.doubleclick.net/* -mime:application/foobar ) Information, Warnings and
Errors reported for this mirror: note: the hts-log.txt file, and hts-cache
folder, may contain sensitive information, such as username/password
authentication for websites mirrored in this project do not share these
files/folders if you want these information to remain private 04:54:00 Error:
"Not Found" (404) at link www.naszawitryna.pl.wstub.archive.org/images/bg2.gif
(from web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/nasza_witryna.gif (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/bg.gif (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/english.gif (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/german.gif (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/dont_mess_with_poland.gif (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/nowe/new_article.gif (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/nowosci_NW_lt.gif (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/arrow.gif (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/new.gif (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/tygodnik_glos.jpg (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/nasz_dziennik.jpg (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/niedziela.jpg (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/mysl_polska.jpg (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/nasza_polska.jpg (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/orzel_antyk_s2.gif (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) 04:54:00
Error: "Not Found" (404) at link
www.naszawitryna.pl.wstub.archive.org/images/radio_maryja.jpg (from
web.archive.org/web/20080513204319/http://www.naszawitryna.pl/) HTTrack
Website Copier/3.43-9 mirror complete in 27 seconds : 20 links scanned, 3
files written (19510 bytes overall) [20843 bytes received at 771 bytes/sec],
20750 bytes transfered using HTTP compression in 6 files, ratio 36%, 1.2
requests per connection (17 errors, 0 warnings, 0 messages) I cant really get
anything out of Wayback Machine, any help?
 
Reply Create subthread


All articles

Subject Author Date
Web Archive problems once again.

09/24/2010 22:00
Re: Web Archive problems once again.

09/24/2010 23:22
Re: Web Archive problems once again.

09/25/2010 21:45
Re: Web Archive problems once again.

09/26/2010 01:20
Re: Web Archive problems once again.

09/26/2010 17:10
Re: Web Archive problems once again.

11/30/2014 23:01




e

Created with FORUM 2.0.11