|
I have a website with this structure:
www.blablabla.com/directory/0001/
www.blablabla.com/directory/0002/
www.blablabla.com/directory/0003/
www.blablabla.com/directory/0004/
www.blablabla.com/directory/etc...
In all these folders there are 1 or 2 pdf/xls/doc/xml files. I would like to
grab all the pdf/xls/doc files.
I managed to grab the structure, but all of these file types are renamed to
html's.
You need to be authenticated to browse the site and download the files, this
is checked with a session cookie.
What is the problem:
It can browse it so authentication is no problem.
It also gets the xml files, but I'm not interested in that.
If I open one of the renamed files, it gives me the 'authentication' page, so
I assume when httrack feches it, it is redirected to this page.
Can anybody point out why that is? Or what I am doing wrong? | |