HTTrack Website Copier
Free software offline browser - FORUM
Subject: Download files as is
Author: John Mcoy
Date: 10/23/2008 21:51
 
I have a website with this structure:

www.blablabla.com/directory/0001/
www.blablabla.com/directory/0002/
www.blablabla.com/directory/0003/
www.blablabla.com/directory/0004/
www.blablabla.com/directory/etc...

In all these folders there are 1 or 2 pdf/xls/doc/xml files. I would like to
grab all the pdf/xls/doc files. 

I managed to grab the structure, but all of these file types are renamed to
html's.

You need to be authenticated to browse the site and download the files, this
is checked with a session cookie.

What is the problem:

It can browse it so authentication is no problem.
It also gets the xml files, but I'm not interested in that.
If I open one of the renamed files, it gives me the 'authentication' page, so
I assume when httrack feches it, it is redirected to this page. 

Can anybody point out why that is? Or what I am doing wrong?
 
Reply


All articles

Subject Author Date
Download files as is

10/23/2008 21:51
Re: Download files as is

10/23/2008 22:53
Re: Download files as is

10/24/2008 20:07




4

Created with FORUM 2.0.11