HTTrack Website Copier
Free software offline browser - FORUM
Subject: Trying to get PDF files only
Author: GSaunier
Date: 03/27/2018 16:06
 
Hello everyone,

I would like, on a website that makes public archives available, to retrieve a
batch of documents. These documents are in PDF format.

Here's how the site tree is built:
www.site.org
|
/librairy.html	
|	|
...	|
	Ress_1
	Ress_2
	Ress_3
	|	|
	...	|
		/ress/read=3.html
			Review_1
			Review_2
			Review_3
			|	|
			...	|
		
=> /ress/read=3&review=3&.html
|
Review_number1
Review_number2
Review_number3
Review_number4
|	|
...	|
	/random_number/
	|
	=> Link to PDF
		=>/random_numer/X/random_number.pdf

What I need is to “read” every pages under
“/ressources/read=3&review=3&.html” and to download only the PDF files.
How should I set up the filters in order to do so ? I have tried several
combinations, but unsuccessfully (I’m getting everything from the web
site).

Thank you for yor help.

GS.
 
Reply


All articles

Subject Author Date
Trying to get PDF files only

03/27/2018 16:06




5

Created with FORUM 2.0.11