Re: Trying to grab download pdfs - HTTrack Website Copier Forum

Subject: Re: Trying to grab download pdfs

Author: William Roeder

Date: 07/27/2011 16:38

> Then the scan rules I use are:
> -*
> +www.tefl.net/esl-lesson-plans/*.htm
> +www.tefl.net/esl-lesson-plans/*.pdf
> +www.tefl.net/esl-lesson-plans/worksheets*.htm
> +*.pdf
> +*.png +*.gif +*.jpg +*.css +*.js
> -ad.doubleclick.net/* -mime:application/foobar
If you want everything don't use any filters just the near flag (get non-html
related)
You must allow html to get links but *.htm only gets that extension, no .html,
no .php etc. Better to use +mime:text/html

> And I've set no robot rules! But the downloads come
> back empty! What am I overlooking?? Thanks
Always post the actual command used (log file line two) so we know what you
did, not what you think you did.

I had no problems getting the pdfs with
-* +*.htm +*.pdf

Empty is usually the site doesn't like the HTTrack Browser ID. I always run
using msie

(winhttrack -qir3C2%Pns2u1Z%s%uN0%I0p3DaK0c4H0%kf2o0%f#f -F "Mozilla/4.0
(compatible; MSIE 6.0; Windows NT 5.0)" -%F "<!-- Mirrored by HTTrack Website
Copier/3.x [XR/YP'2000] -->" -%l "en, en, *"
<http://www.tefl.net/esl-lesson-plans/worksheets-topic.htm> -O1
C:\Users\Bill\HTTrack\test -* +*.htm +*.pdf )

Create subthread

All articles

Subject	Author	Date
Trying to grab download pdfs		07/27/2011 13:48
Re: Trying to grab download pdfs		07/27/2011 13:50
Re: Trying to grab download pdfs		07/27/2011 16:38
Re: Trying to grab download pdfs		07/28/2011 05:46
Re: Trying to grab download pdfs		07/28/2011 11:08
Re: Trying to grab download pdfs		07/28/2011 17:29
Re: Trying to grab download pdfs		07/28/2011 17:29
Re: Trying to grab download pdfs		07/29/2011 09:21
Re: Trying to grab download pdfs		07/29/2011 16:54