Trying to get PDF files only - HTTrack Website Copier Forum

Subject: Trying to get PDF files only

Author: GSaunier

Date: 03/27/2018 16:06

Hello everyone,

I would like, on a website that makes public archives available, to retrieve a
batch of documents. These documents are in PDF format.

Here's how the site tree is built:
www.site.org
|
/librairy.html	
|	|
...	|
	Ress_1
	Ress_2
	Ress_3
	|	|
	...	|
		/ress/read=3.html
			Review_1
			Review_2
			Review_3
			|	|
			...	|
		
=> /ress/read=3&review=3&.html
|
Review_number1
Review_number2
Review_number3
Review_number4
|	|
...	|
	/random_number/
	|
	=> Link to PDF
		=>/random_numer/X/random_number.pdf

What I need is to “read” every pages under
“/ressources/read=3&review=3&.html” and to download only the PDF files.
How should I set up the filters in order to do so ? I have tried several
combinations, but unsuccessfully (I’m getting everything from the web
site).

Thank you for yor help.

GS.

All articles

Subject	Author	Date
Trying to get PDF files only		03/27/2018 16:06