HTTrack Website Copier
Free software offline browser - FORUM
Subject: PDF Files from a site with Javascript
Author: TB
Date: 05/21/2015 13:11
The aim is to download all linked PDF files from the following site (and
combine them later):

<> [A statistical yearbook of
Canada]. I plan to do this for several yearbooks linked here:

Possibility 1. Using HTTRACK on the site directly

Using HTTRACK for the site itself as

httrack <> "/statcan/" -*
+mime:/aspx/text/html +*.pdf

returns only the four PDF files that can directly be accessed. The problem is
that the other links need to be uncovered by Javascript first and I am unsure
how to this.

Possibility 2. Using HTTRACK on built in search function

The site features a search function.

Searching for "1" and "a" essentially returns all pages of a yearbook I
presume. We can thus get a long list of pdf file links (see here).

Unfortunately using HTTRACK on this list returns lots of html files, but not
the required PDFs.

"/statcan/" +*" +*.pdf  

Any ideas how to change the options in HTTRACK? (or perhaps WGET works as

All articles

Subject Author Date
PDF Files from a site with Javascript

05/21/2015 13:11


Created with FORUM 2.0.11