| > Then the scan rules I use are:
> -*
> +www.tefl.net/esl-lesson-plans/*.htm
> +www.tefl.net/esl-lesson-plans/*.pdf
> +www.tefl.net/esl-lesson-plans/worksheets*.htm
> +*.pdf
> +*.png +*.gif +*.jpg +*.css +*.js
> -ad.doubleclick.net/* -mime:application/foobar
If you want everything don't use any filters just the near flag (get non-html
related)
You must allow html to get links but *.htm only gets that extension, no .html,
no .php etc. Better to use +mime:text/html
> And I've set no robot rules! But the downloads come
> back empty! What am I overlooking?? Thanks
Always post the actual command used (log file line two) so we know what you
did, not what you think you did.
I had no problems getting the pdfs with
-* +*.htm +*.pdf
Empty is usually the site doesn't like the HTTrack Browser ID. I always run
using msie
(winhttrack -qir3C2%Pns2u1Z%s%uN0%I0p3DaK0c4H0%kf2o0%f#f -F "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0)" -%F "<!-- Mirrored by HTTrack Website
Copier/3.x [XR/YP'2000] -->" -%l "en, en, *"
<http://www.tefl.net/esl-lesson-plans/worksheets-topic.htm> -O1
C:\Users\Bill\HTTrack\test -* +*.htm +*.pdf )
| |