HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Downloading just PDF files
Author: Ravi
Date: 07/24/2004 19:40
 
I have requirement to get the content form the sites that 
we have partenered with. There are around 100 different 
sites to crawl. I need to crawl the web site and get just 
the articles. The articles content is in two formats. Some 
sites the articles are in HTML format and other sites the 
articiles are in PDF format.The no of articles from each 
site varies from 200K to 500K.
Some sites the link to PDF file is not ending with .pdf 
the link is similar to this <web root>/ViewPdf?artid=1000. 
When the user clicks on the link it is lonching the adobi 
with the PDF file.
As you know saving all the site is not a good solution 
because they are prety big sites.
Is there any way I can scan whole site and get the 
articles that need. 
Thsi is a very good tool so far I have  found. I 
appreciate your help.

Thank you,
Ravi
 
Reply Create subthread


All articles

Subject Author Date
Downloading just PDF files

07/23/2004 03:51
Re: Downloading just PDF files

07/23/2004 11:00
Re: Downloading just PDF files

07/24/2004 09:09
Re: Downloading just PDF files

07/24/2004 12:32
Re: Downloading just PDF files

07/24/2004 19:40
Re: Downloading just PDF files

07/24/2004 20:54
Re: Downloading just PDF files

07/27/2004 23:55
Re: Downloading just PDF files

07/28/2004 15:11




7

Created with FORUM 2.0.11