HTTrack Website Copier
Free software offline browser - FORUM
Subject: character limit
Author: CB
Date: 06/06/2012 05:21
 
My apologies for the length of this - I wanted to make sure you have all the
information for my issue(s)

Issue 1:
I'm running into a problem mirroring my travelblog
(http://surfcjb.wordpress.com) - seems that some of the pages have exceedingly
long URLs (>255 characters) and those pages do not get mirrored.  Is this a
limitation in HTTrack, or is there something else going on?
HTTrack3.44-5+htsswf+htsjava launched on Thu, 31 May 2012 23:56:23 at
<http://surfcjb.wordpress.com> +*.tif +*.png +*.gif +*.jpg +*.bmp +*.css +*.js
-ad.doubleclick.net/* -*.mov -*.avi -*.mp* -*.wmv -*.wma -*.qt -*.cgi*

("C:\Program Files\WinHTTrack\httrack.exe" <http://surfcjb.wordpress.com> -O
"C:\Users\Chris\Documents\Work\Secure\Task Order 08\MirrorTest\My
Site\English"
-qr8%e0C2%Pxs0u1%s%uN0%Ip7DaK0c2T6J128R2H1%kf2o0A50000%c2#L10000%f#fb1j1n -F
"IE 9.0, WINx64, Windows 7" -%A standard +*.tif +*.png +*.gif +*.jpg +*.bmp
+*.css +*.js -ad.doubleclick.net/* -*.mov -*.avi -*.mp* -*.wmv -*.wma -*.qt
-*.cgi* )



Information, Warnings and Errors reported for this mirror:

note:	the hts-log.txt file, and hts-cache folder, may contain sensitive
information,

	such as username/password authentication for websites mirrored in this
project

	do not share these files/folders if you want these information to remain
private



23:57:15	Warning: 	File not parsed, looks like binary:
surfcjb.wordpress.com/cairo-egypt-june-2009-let-the-adventure-begin/baksheesh-the-new-wmd/camels-fish-and-frogger/24-hours-in-the-cairo-airport/i-like-pig/one-last-venting/

23:57:15	Error: 	Unable to save file C:/Users/Chris/Documents/Work/Secure/Task
Order 08/MirrorTest/My
Site/English/surfcjb.wordpress.com/cairo-egypt-june-2009-let-the-adventure-begin/baksheesh-the-new-wmd/camels-fish-and-frogger/24-hours-in-the-cairo-airport/i-like-pig/one-last-venting/index.html
: No such file or directory

23:57:31	Warning: 	File not parsed, looks like binary:
surfcjb.wordpress.com/south-african-adventure-part-1-redux/lions-have-teeth-south-african-adventure-part-2-redux-2-too/sharks-have-teeth-too-on-to-zambia-i-go/farewell-to-zambia-the-african-adventure-concludes-sorta/

23:57:31	Error: 	Unable to save file C:/Users/Chris/Documents/Work/Secure/Task
Order 08/MirrorTest/My
Site/English/surfcjb.wordpress.com/south-african-adventure-part-1-redux/lions-have-teeth-south-african-adventure-part-2-redux-2-too/sharks-have-teeth-too-on-to-zambia-i-go/farewell-to-zambia-the-african-adventure-concludes-sorta/index.html
: No such file or directory

23:57:32	Warning: 	File not parsed, looks like binary:
surfcjb.wordpress.com/south-african-adventure-part-1-redux/lions-have-teeth-south-african-adventure-part-2-redux-2-too/sharks-have-teeth-too-on-to-zambia-i-go/farewell-to-zambia-the-african-adventure-concludes-sorta/end-of-the-road-african-adventure-part-5-i-think/

23:57:32	Error: 	Unable to save file C:/Users/Chris/Documents/Work/Secure/Task
Order 08/MirrorTest/My
Site/English/surfcjb.wordpress.com/south-african-adventure-part-1-redux/lions-have-teeth-south-african-adventure-part-2-redux-2-too/sharks-have-teeth-too-on-to-zambia-i-go/farewell-to-zambia-the-african-adventure-concludes-sorta/end-of-the-road-african-adventure-part-5-i-think/index.html
: No such file or directory

23:57:35	Error: 	"Credentials required." (401) at link
<https://surfcjb.wordpress.com/wp-app.php/service> (from
surfcjb.wordpress.com/xmlrpc.php?rsd)

23:57:39	Warning: 	File has moved from
surfcjb.wordpress.com/2009/07/11/hello-world/trackback/ to
<http://surfcjb.wordpress.com/2009/07/11/hello-world/>

23:58:26	Warning: 	File has moved from surfcjb.wordpress.com/wp-admin/ to
<http://surfcjb.wordpress.com/wp-login.php?redirect_to=http%3A%2F%2Fsurfcjb.wordpress.com%2Fwp-admin%2F&reauth=1>

23:58:29	Error: 	"Not Found" (404) at link
surfcjb.wordpress.com/wp-admin/&reauth=1 (from
surfcjb.wordpress.com/wp-login.php?redirect_to=http%3A%2F%2Fsurfcjb.wordpress.com%2Fwp-admin%2F%26reauth=1)

23:59:14	Error: 	"" (404) at link surfcjb.wordpress.com/wp-admin/&reauth=1
(from
surfcjb.wordpress.com/wp-login.php?redirect_to=http%3A%2F%2Fsurfcjb.wordpress.com%2Fwp-admin%2F%26reauth=1)

23:59:15	Error: 	"Not Found" (404) at link
surfcjb.wordpress.com/ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/
(from s.gravatar.com/js/gprofiles.js?aa&ver=3.4-RC1-20965)

23:59:15	Error: 	"Not Found" (404) at link surfcjb.wordpress.com/mustache.js
(from
s1.wp.com/wp-content/mu-plugins/notes/mustache.js?m=1334597730g&ver=2012-05-04)

23:59:16	Error: 	"Not Found" (404) at link surfcjb.wordpress.com/^javascript/
(from s.stats.wordpress.com/w.js?21)



HTTrack Website Copier/3.44-5 mirror complete in 2 minutes 53 seconds : 226
links scanned, 229 files written (2892708 bytes overall) [1930445 bytes
received at 11158 bytes/sec], 1616323 bytes transfered using HTTP compression
in 70 files, ratio 31%, 1.0 requests per connection

(9 errors, 5 warnings, 0 messages)

Issue 2:
 I tried the same mirroring with filters and the mirror didn't capture any of
the content lower than the first page and doesn't look like it used the style
sheet (css) either ... any idea what I did wrong?  Am I getting too involved
with the filters? I tried following your example of good, efficient mirroring

HTTrack3.44-5+htsswf+htsjava launched on Tue, 05 Jun 2012 19:55:22 at
<http://surfcjb.wordpress.com> -* +surfcjb.wordpress.com/*.htm*
+surfcjb.wordpress.com/*.xml* +surfcjb.wordpress.com/*.css*
+surfcjb.wordpress.com/*.php* +surfcjb.wordpress.com/*.asp*
+surfcjb.wordpress.com/*.pdf* +surfcjb.wordpress.com/*.swf* +/*.js*
+surfcjb.wordpress.com/*.java* +surfcjb.wordpress.com/*.tif*
+surfcjb.wordpress.com/*.gif* +surfcjb.wordpress.com/*.jpg*
+surfcjb.wordpress.com/*.bmp* +surfcjb.wordpress.com/*.png*
+surfcjb.wordpress.com/*.doc* +surfcjb.wordpress.com/*.xls*
+surfcjb.wordpress.com/*.ppt* +surfcjb.wordpress.com/*.txt* -mime:*/*
+mime:text/* +mime:image/* +mime:application/pdf
+mime:application/x-shockwave-flash +mime:application/vnd.ms*

("C:\Program Files\WinHTTrack\httrack.exe" <http://surfcjb.wordpress.com> -O
"C:\Users\Chris\Documents\Work\Secure\Task Order 08\MirrorTest\My
Site\English"
-qr8%e0C2%Pxs0u1%s%uN0%Ip7DaK0c2T6J128R2H1%kf2o0A50000%c2#L10000%f#fb1j1n -F
"IE 9.0, WINx64, Windows 7" -%A standard -* +surfcjb.wordpress.com/*.htm*
+surfcjb.wordpress.com/*.xml* +surfcjb.wordpress.com/*.css*
+surfcjb.wordpress.com/*.php* +surfcjb.wordpress.com/*.asp*
+surfcjb.wordpress.com/*.pdf* +surfcjb.wordpress.com/*.swf* +/*.js*
+surfcjb.wordpress.com/*.java* +surfcjb.wordpress.com/*.tif*
+surfcjb.wordpress.com/*.gif* +surfcjb.wordpress.com/*.jpg*
+surfcjb.wordpress.com/*.bmp* +surfcjb.wordpress.com/*.png*
+surfcjb.wordpress.com/*.doc* +surfcjb.wordpress.com/*.xls*
+surfcjb.wordpress.com/*.ppt* +surfcjb.wordpress.com/*.txt* -mime:*/*
+mime:text/* +mime:image/* +mime:application/pdf
+mime:application/x-shockwave-flash +mime:application/vnd.ms* )



Information, Warnings and Errors reported for this mirror:

note:	the hts-log.txt file, and hts-cache folder, may contain sensitive
information,

	such as username/password authentication for websites mirrored in this
project

	do not share these files/folders if you want these information to remain
private



19:55:30	Warning: 	Unexpected incomplete type with 200 code at
surfcjb.wordpress.com/osd.xml

19:55:34	Error: 	"Credentials required." (401) at link
<https://surfcjb.wordpress.com/wp-app.php/service> (from
surfcjb.wordpress.com/xmlrpc.php?rsd)

19:55:34	Error: 	"" (401) at link
<https://surfcjb.wordpress.com/wp-app.php/service> (from
surfcjb.wordpress.com/xmlrpc.php?rsd)



HTTrack Website Copier/3.44-5 mirror complete in 12 seconds : 8 links scanned,
7 files written (30769 bytes overall) [17127 bytes received at 1427
bytes/sec], 28704 bytes transfered using HTTP compression in 4 files, ratio
31%

(2 errors, 1 warnings, 0 messages)

and also tried just this for filters:
-mime:*/* +mime:text/* +mime:image/* +mime:application/pdf
+mime:application/x-shockwave-flash +mime:application/vnd.ms
The mirror looks good, but took much much longer (7:15 versus 2:54 (min:sec)
... lots of "delayed" files in the process - looks like it was an issue with
b.scorecardresearch.com and s.stats.wordpress.com.
I thought I had the options set to not get outside website links ... ??? This
also has the long URL name issue but I'm not that worried about that here -
just filter use.

HTTrack3.44-5+htsswf+htsjava launched on Tue, 05 Jun 2012 20:02:31 at
<http://surfcjb.wordpress.com> -mime:*/* +mime:text/* +mime:image/*
+mime:application/pdf +mime:application/x-shockwave-flash
+mime:application/vnd.ms*

("C:\Program Files\WinHTTrack\httrack.exe" <http://surfcjb.wordpress.com> -O
"C:\Users\Chris\Documents\Work\Secure\Task Order 08\MirrorTest\My
Site\English"
-qr8%e0C2%Pxs0u1%s%uN0%Ip7DaK0c2T6J128R2H1%kf2o0A50000%c2#L10000%f#fb1j1n -F
"IE 9.0, WINx64, Windows 7" -%A standard -mime:*/* +mime:text/* +mime:image/*
+mime:application/pdf +mime:application/x-shockwave-flash
+mime:application/vnd.ms* )



Information, Warnings and Errors reported for this mirror:

note:	the hts-log.txt file, and hts-cache folder, may contain sensitive
information,

	such as username/password authentication for websites mirrored in this
project

	do not share these files/folders if you want these information to remain
private



20:02:42	Warning: 	Unexpected incomplete type with 200 code at
surfcjb.wordpress.com/osd.xml

20:04:28	Warning: 	File not parsed, looks like binary:
surfcjb.wordpress.com/cairo-egypt-june-2009-let-the-adventure-begin/baksheesh-the-new-wmd/camels-fish-and-frogger/24-hours-in-the-cairo-airport/i-like-pig/one-last-venting/

20:04:28	Error: 	Unable to save file C:/Users/Chris/Documents/Work/Secure/Task
Order 08/MirrorTest/My
Site/English/surfcjb.wordpress.com/cairo-egypt-june-2009-let-the-adventure-begin/baksheesh-the-new-wmd/camels-fish-and-frogger/24-hours-in-the-cairo-airport/i-like-pig/one-last-venting/index.html
: No such file or directory

20:05:34	Warning: 	File not parsed, looks like binary:
surfcjb.wordpress.com/south-african-adventure-part-1-redux/lions-have-teeth-south-african-adventure-part-2-redux-2-too/sharks-have-teeth-too-on-to-zambia-i-go/farewell-to-zambia-the-african-adventure-concludes-sorta/

20:05:34	Error: 	Unable to save file C:/Users/Chris/Documents/Work/Secure/Task
Order 08/MirrorTest/My
Site/English/surfcjb.wordpress.com/south-african-adventure-part-1-redux/lions-have-teeth-south-african-adventure-part-2-redux-2-too/sharks-have-teeth-too-on-to-zambia-i-go/farewell-to-zambia-the-african-adventure-concludes-sorta/index.html
: No such file or directory

20:05:35	Warning: 	File not parsed, looks like binary:
surfcjb.wordpress.com/south-african-adventure-part-1-redux/lions-have-teeth-south-african-adventure-part-2-redux-2-too/sharks-have-teeth-too-on-to-zambia-i-go/farewell-to-zambia-the-african-adventure-concludes-sorta/end-of-the-road-african-adventure-part-5-i-think/

20:05:35	Error: 	Unable to save file C:/Users/Chris/Documents/Work/Secure/Task
Order 08/MirrorTest/My
Site/English/surfcjb.wordpress.com/south-african-adventure-part-1-redux/lions-have-teeth-south-african-adventure-part-2-redux-2-too/sharks-have-teeth-too-on-to-zambia-i-go/farewell-to-zambia-the-african-adventure-concludes-sorta/end-of-the-road-african-adventure-part-5-i-think/index.html
: No such file or directory

20:05:50	Error: 	"Credentials required." (401) at link
<https://surfcjb.wordpress.com/wp-app.php/service> (from
surfcjb.wordpress.com/xmlrpc.php?rsd)

20:06:00	Warning: 	File has moved from
surfcjb.wordpress.com/2009/07/11/hello-world/trackback/ to
<http://surfcjb.wordpress.com/2009/07/11/hello-world/>

20:08:02	Warning: 	File has moved from surfcjb.wordpress.com/wp-admin/ to
<http://surfcjb.wordpress.com/wp-login.php?redirect_to=http%3A%2F%2Fsurfcjb.wordpress.com%2Fwp-admin%2F&reauth=1>

20:08:11	Error: 	"Not Found" (404) at link
surfcjb.wordpress.com/wp-admin/&reauth=1 (from
surfcjb.wordpress.com/wp-login.php?redirect_to=http%3A%2F%2Fsurfcjb.wordpress.com%2Fwp-admin%2F%26reauth=1)

20:08:32	Warning: 	File seems complete (same size), but there was a cache read
error (4294967295): surfcjb.wordpress.com/osd.xml

20:09:41	Warning: 	File has moved from surfcjb.wordpress.com/wp-admin/ to
<http://surfcjb.wordpress.com/wp-login.php?redirect_to=http%3A%2F%2Fsurfcjb.wordpress.com%2Fwp-admin%2F&reauth=1>

20:09:46	Error: 	"Not Found" (404) at link
surfcjb.wordpress.com/2009/07/11/hello-world/mustache.js (from
s1.wp.com/wp-content/mu-plugins/notes/mustache.js?m=1334597730g&ver=2012-05-04)



HTTrack Website Copier/3.44-5 mirror complete in 7 minutes 15 seconds : 207
links scanned, 207 files written (2631399 bytes overall) [4224709 bytes
received at 9711 bytes/sec], 1388258 bytes transfered using HTTP compression
in 57 files, ratio 31%, 1.0 requests per connection

(6 errors, 8 warnings, 0 messages)

I'd really like to be efficient with making the mirrors (the travelblog is
just my test-site) as I have many more to do and any help with the filters
would be awesome.

Again - sorry about how long all that was!
 
Reply


All articles

Subject Author Date
character limit

06/06/2012 05:21
Re: character limit

06/06/2012 15:13
Re: character limit

06/15/2012 22:44




6

Created with FORUM 2.0.11