| 1) I used webhhtrack.
In my log, the command seems to be:
HTTrack3.44-1+libhtsjava.so.2 launched on Wed, 24 Apr 2013 20:14:46 at
<http://www.curso-objetivo.br/vestibular/resolucao_comentada/fuvest.asp> +*.png
+*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -*.pdf
(webhttrack -q -%i -w
<http://www.curso-objetivo.br/vestibular/resolucao_comentada/fuvest.asp> -O
"/home/lucas/downloads/websites/fuvests_objetivo" -n -t -%P -N0 -s0 -x -p7 -D
-a -K0 -c4 -%k -R200 -A25000 -F "Mozilla/5.0 (X11; Linux i686)
AppleWebKit/537.22 (KHTML, like Gecko) Chrome/25.0.1364.172 Safari/537.22" -%F
"<!-- Mirrored from %s%s by HTTrack Website Copier/3.x [XR&CO'2008], %s -->"
+*.png +*.gif +*.jpg +*.css +*.js -ad.doubleclick.net/* -*.pdf -%s -%u )
2) this url was not fetched:
<http://www.curso-objetivo.br/vestibular/resolucao_comentada/fuvest/2007_2fase/3dia/03.gif>
in the original, online file, it was relative
/vestibular/resolucao_comentada/fuvest/2007_2fase/3dia/03.gif
the file that contains it is
<http://www.curso-objetivo.br/vestibular/resolucao_comentada/fuvest/fuvest2007_2fase.asp?img=01>
I suspect that the problem has to do with the fact that the URL is inside a
javascript call.
In the said call, the " are replaced with "
(other similar calls, with the ", seem to work fine)
3) ok
4) there does not seem to be anything useful there.
But what do I know ? =P
here is the file: <https://www.dropbox.com/s/sbbvty4bto9mhq1/hts-log.txt>
5) d) I had a high retry number, but I suspect the problem is in the actual
parsing
extended_parsing: I am not sure how to enable that | |