HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Sensitivity to 'alt' content
Author: Cleon Teunissen
Date: 06/25/2011 18:55
 
> Did you have extend parsing turned on (attempt to
> detect all)?> 
> The parser may be very simplistic. Simply skips the
> initial quote if present and stops at the trailing
> quote or space or angle brackets.

By default extended parsing is turned on:
"Attempt to detect all links (even in unknown tags/javascript code)"

I unchecked that option and I had another run.
(Both runs were set to "no robots.txt rules")
No difference: the images with alt= attributes containing <br> were not
downloaded.

Extended parsing option checked (default):
winhttrack -qiC2%Ps0u1%s%uN0%I0p3DaK0H0%kf2A25000%f#f

Extended parsing option unchecked:
winhttrack -qiC2%P0s0u1%s%uN0%I0p3DaK0H0%kf2A25000%f#f

So the extended parsing option makes no difference.

As you suggest: most likely the parser stops parsing the IMG element when it
encounters a closing angled bracket, even if that closing angled bracket is
inside an attribute declaration. 

Presumably web browsers take locating the src= attribute as the first
priority.

Anyway, I've purged all <br>'s from the alt= attributes in my source code, and
that change will soon be pushed out. 
 
Reply Create subthread


All articles

Subject Author Date
Sensitivity to 'alt' content

06/25/2011 14:48
Re: Sensitivity to 'alt' content

06/25/2011 16:40
Re: Sensitivity to 'alt' content

06/25/2011 18:55
<br> tag and "alt"

07/05/2011 06:18




b

Created with FORUM 2.0.11