Re: Help- webpage without extension - HTTrack Website Copier Forum

Subject: Re: Help- webpage without extension

Author: WHRoeder

Date: 05/01/2013 14:42

1) Always post the ACTUAL command line used (or log file line two) so we know
what the site is, what ALL your settings are, etc.
2) Always post the URLs you're not getting and from what URL it is
referenced.
3) Always post anything USEFUL from the log file.
4) If you want everything use the near flag (get non-html files related) not
filters.
5) I always run with A) No External Pages so I know where the mirror ends.
With B) browser ID=msie 6 pulldown as some sites don't like a HTT one. With C)
Attempt to detect all links (for JS/CSS.) With D) Timeout=60, retry=9 to avoid
temporary network interruptions from deleting files.

> <http://www.riderta.com/>
> 
> I understand that I must let it spider the site: -*
> +*.pdf +*.html
> 
> The problem is that each route has a separate page
> with a link to the pdf, but no file extension, hence
> +*.html is not picking anything up. 
So you can NOT get JUST the pdf and html, drop your filters.
Since the directory structure doesn't contain any periods 
i.e. <http://www.riderta.com/routes/15>
you can try rejecting all URLs with periods except pdf: -*/*[name].* +*.pdf
ref: <http://www.httrack.com/html/fcguide.html>

> All the PDFs are in one directory, though there is
> not a page where they are all listed.
> <http://www.riderta.com/sites/default/files/schedule>-
> pdfs/
No magic here, if your browser can't read the directory, neither can HTT.

Create subthread

All articles

Subject	Author	Date
Help- webpage without extension		05/01/2013 07:36
Re: Help- webpage without extension		05/01/2013 14:42