HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: filenames including %2F not downloaded
Author: Jean-Marc
Date: 02/18/2012 18:48
 
ok so indeed the culprit must have been this html in my shell pipe.
Replace the dot with a space between a and href in the following line to set
it to work:
cat httrack/manga.animea.net/real*.html | grep "onerror=\"this.src" | sed
's!<a.href="\(.*\).html"><img src="\(.*\)" onerror="this.src=.*!YEAH
httrack/manga.animea.net/\2 \1.jpg!' | grep "YEAH " | grep -v "notfound.png" |
less
This parses the lines of the html files to extract the name of the images I'm
interested in and also prints an approximation to the filename (so that I can
go check it). It can be seen that some images are not local and they're
exactly the images with %2F names; also, if you | grep http: | wc -l, you see
there are many more non-local images than warning messages in the log.
Oups, gotta go. Thanks.
3.44-4 Debian/gnome console
 
Reply Create subthread


All articles

Subject Author Date
filenames including %2F not downloaded

02/18/2012 02:25
Re: filenames including %2F not downloaded

02/18/2012 14:43
Re: filenames including %2F not downloaded

02/18/2012 18:38
Re: filenames including %2F not downloaded

02/18/2012 18:48
Re: filenames including %2F not downloaded

02/18/2012 19:24
Not a %2F problem after all

02/19/2012 23:04
Re: Not a %2F problem after all

02/20/2012 14:42




4

Created with FORUM 2.0.11