HTTrack Website Copier
Free software offline browser - FORUM
Subject: Bug in Limits?
Author: Stevod
Date: 11/13/2007 17:07
 
Hi,

I'm having long-standing problems with some of the LIMIT parameters, in that
files are being recorded as 'added' in the new.txt files in hts-cache, and yet
are never saved to disk.

As an example, take the project error log below, which uses all the defaults
plus an added size limit to download 1MB of the cisco website. The log says
that 42 files were saved to disk, and new.txt says that 40 were added, and yet
doing a "dir /s" on the download shows only 24 were saved to disk, comprising
a total of 517KB, not the 1MB requested.

The same behaviour is exhibited on some other of the LIMIT parameters, as
well. I suspect that reaching the limit means that the downloaded but
not-saved files are discarded, rather than cleanly flushed to disk. This
happens on most, if not all of the large websites that I download
automatically, and presumably leads to some people not getting what they asked
for!

It was first reported in 2006
(http://forum.httrack.com/readmsg/14608/index.html) - see that thread for
further info.

It's causing me a major problem because I auto-parse the new.txt file to find
what has been downloaded, and it's not correct. Help!!

Many thanks
Stevod

------------------------------------------------------
HTTrack3.33+swf launched on Tue, 13 Nov 2007 15:45:32 at www.cisco.com +*.png
+*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/*
(winhttrack -qwC2%Ps2u1%s%uN0%I0p3DaK0M1000000H0%kf2A25000%f#f -F "Mozilla/4.5
(compatible; HTTrack 3.0x; Windows 98)" -%F "<!-- Mirrored from %s%s by
HTTrack Website Copier/3.x [XR&CO'2005], %s -->" -%l "en, en, *" www.cisco.com
-O "C:\My Web Sites\test2,C:\My Web Sites\test2" +*.png +*.gif +*.jpg +*.css
+*.js -ad.doubleclick.net/* -%A php3,php,php2,asp,jsp,pl,cfm,nsf=text/html )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may contain sensitive
information,
 such as username/password authentication for websites mirrored in this
project
 do not share these files/folders if you want these information to remain
private
15:45:33 Info:  Note: due to www.cisco.com remote robots.txt rules, links
begining with these path will be forbidden: /bug-navigator, /cgi-bin,
/pcgi-bin, /univ-src/ccden, /cpropub/univercd, /jobs (see in the options to
disable this)
15:45:59 Warning:  File has moved from www.cisco.com/en/US/about/index.html to
<http://www.cisco.com/web/about/index.html>
More than 1000000 bytes have been transfered.. giving up
More than 1000000 bytes have been transfered.. giving up
More than 1000000 bytes have been transfered.. giving up
More than 1000000 bytes have been transfered.. giving up
HTTrack Website Copier/3.33 mirror complete in 37 seconds : 301 links scanned,
42 files written (1016827 bytes overall) [1057480 bytes received at 28580
bytes/sec], 1.0 requests per connection
(No errors, 1 warnings, 1 messages)
 
Reply


All articles

Subject Author Date
Bug in Limits?

11/13/2007 17:07




d

Created with FORUM 2.0.11