HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: filenames including %2F not downloaded
Author: William Roeder
Date: 02/18/2012 19:24
 
> This site could do a number of things aside loosing
> the message, ...
This user doesn't disagree.

> httrack -C1 -%v -f2 <http://manga.animea.net/real.html>
> -O httrack "-*"
> "+http://manga.animea.net/real*.html*[]" "+*.jpg"

cat httrack/manga.animea.net/real*.html | grep "onerror=\"this.src" | sed
's!<a.href="\(.*\).html"><img src="\(.*\)" onerror="this.src=.*!YEAH
httrack/manga.animea.net/\2 \1.jpg!' | grep "YEAH " | grep -v "notfound.png"
|

there are many more non-local images than warning messages in the log.
1) you said to get all *html and *jpg. Not all html and images. -*
+mime:text/html +mime:image/*
2) you did not say to use extended parsing to get urls in javascript so none
of the onerror=.. has been used.
<http://www.httrack.com/html/fcguide.html>
%P *extended parsing, attempt to parse all links, even in unknown tags or
Javascript (%P0 don't use) (--extended-parsing[=N])
I doubt HTT can parse them and get the image url from the onerror="src='...' "
FAQ no full support.

 
Reply Create subthread


All articles

Subject Author Date
filenames including %2F not downloaded

02/18/2012 02:25
Re: filenames including %2F not downloaded

02/18/2012 14:43
Re: filenames including %2F not downloaded

02/18/2012 18:38
Re: filenames including %2F not downloaded

02/18/2012 18:48
Re: filenames including %2F not downloaded

02/18/2012 19:24
Not a %2F problem after all

02/19/2012 23:04
Re: Not a %2F problem after all

02/20/2012 14:42




1

Created with FORUM 2.0.11