HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Bug Report: Long lines truncated
Author: Mottel
Date: 01/31/2005 01:50
 
Reply to Xavier:

>>>>>>
Humm, this is weird, because httrack does not handle
lines at all, but only the whole stream. I suspect some
nasty transfer error or update bug, maybe. Did you attempt 
to edit the file with some editor ? This might be the 
cause.
<<<<<<

No, I tried editing a different file to restore missing 
bits that WinHTTrack had dropped from the mirror, but not 
this one. I was careful to choose for the "before and 
after" sample a file that was exactly as downloaded by 
WinHTTrack.  

I forgot to say before that this truncation is evident in 
many files mirrored from the Dolmetsch site, so I doubt 
very much that this is a freak transmission error, which 
would be more random. In particular, the header lines in 
*all* the music dictionary pages were truncated and all at
the *same* place (except for page B where it occurs a 
little further on).

It may well be an update bug though; I did run an update 
on the mirrored site after the initial download. I'll 
experiment some more and get back to you again on this.


>>>>>>
I tried to mirror
<http://www.dolmetsch.com/defsg.htm>
and the page looked ok with 3.33-rc6 afais
<<<<<<

Is that the same as the Windows version?Have you tried it with WinHTTrack
3.32-2 (the one I am 
using). And in your test, were the links in the mirrored 
file edited by WinHTTTrack? (If you only tried to mirror a 
single file, not a whole site or a whole set of files from 
a site, it may not have converted any links to refer to 
the localised structure. In that case, I would expect no
change from the original.


>>>>>>
Well, httrack does not change anything, actually: if the 
page was LF convention it is still LF convention. Only 
relevant links are patched on-the flo - the rest of the 
data is ok.
<<<<<<

I have some more info for you regarding the line 
termination protocols and what WinHTTrack is doing: I have 
examined these files in a hex editor. You are right in 
saying that the original files on the Dolmetsch site 
follow the (Unix/Mac) single LF convention. However in the 
mirrored copies, WinHTTrack has *added* a CR-LF pair after 
each occurrence of a single LF in the original. I am 
pleased to see that in doing this, the Windows version is 
following the DOS convention for line termination (as 
expected by Windows), but if it is going to make such 
changes wouldn't it be better for it to *replace* single 
LFs with a CR-LF, rather than add a CR-LF to each single 
LF.

At any rate, it seems clear that WinHTTrack *does* take 
notice of line breaks and also makes changes to them. This 
seems to contradict what you have said about HTTrack not 
changing anything except links and also about how it 
handles lines. This reinforces my suspicion that there is
some link between this behaviour and the line truncation 
bug.

BTW, I am using WinHTTrack 3.32-2 under Windows XP Pro, 
ver 2002, on a 731 MHz Pentium 3. My internet connection 
is dial-up using a USB V.90 56K Modem.

 
Reply Create subthread


All articles

Subject Author Date
Bug Report: Long lines truncated

01/30/2005 03:13
Re: Bug Report: Long lines truncated

01/30/2005 09:28
Re: Bug Report: Long lines truncated

01/30/2005 10:31
Re: Bug Report: Long lines truncated

01/31/2005 01:31
Re: Bug Report: Long lines truncated

01/31/2005 01:50




d

Created with FORUM 2.0.11