HTTrack Website Copier
Free software offline browser - FORUM
Subject:
Author: Benjamin
Date: 03/27/2011 14:28
 
Hi there,

I spent hours in reading FAQs and reading through the forums. I couldn't find
my question answered. Please forgive me if it was answered, though, and I was
not able to transfer the information to my problem.

This is the situation:

There's a website offering file downloads. My primary aim is to get those
downloads. Those downloads are 7z files.
The site structure looks kind of this:

- list of categorys
  - category a
  	- alphabetical list: A
		- file description page with link to 7z file
  	- alphabetical list: B
  	- alphabetical list: C
  	- alphabetical list: D
  	- alphabetical list: E
  	- ...
  	- alphabetical list: Z

  - category b
  - category c
  - ...
  - category n

My approach is to start at the point with least complexity and then wrapping
the automation for the whole site around it. It seemed like a specific file
description page would be a good point to start with. Unfortunately, what I
thought being the "least complex" thing causes me hard headaches.

The file description page has an URL like e.g.
<http://website.net/details-840.htm>

The corresponding download URL would be
<http://website.net/download.php?id=840>

When I start the download in a web browser, Firefox' download dialogue appears
offering me to download the file "Dai Meiro - Meikyuu no Tatsujin.7z".
LiveHTTPHeaders transcribes the following:

----------------------------------------------------------
<http://website.net/download.php?id=840>

GET /download.php?id=840 HTTP/1.1
Host: website.net
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; de; rv:1.9.2.15)
Gecko/20110303 Firefox/3.6.15 ( .NET CLR 3.5.30729; .NET4.0C)
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language: de-de,de;q=0.8,en-us;q=0.5,en;q=0.3
Accept-Encoding: gzip,deflate
Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive: 115
Connection: keep-alive
Referer: <http://website.net/details-840.htm>
Cookie: __utma=8087398.1021369519.1286700001.1301221866.1301227586.14;
__utmz=8087398.1286700001.1.1.utmcsr=(direct)|utmccn=(direct)|utmcmd=(none);
style_cookie=printonly; phpbb3_u76eg_u=1; phpbb3_u76eg_k=;
phpbb3_u76eg_sid=4410861c3004b346d3c3e0a478e20e8c;
PHPSESSID=04eec00f42dc667046d3ca4db1e6cb7a; __utmb=8087398.2.10.1301227586;
__utmc=8087398

HTTP/1.1 200 OK
Date: Sun, 27 Mar 2011 12:11:25 GMT
Server: Apache/2.2.9 (Debian) PHP/5.2.6-1+lenny9 with Suhosin-Patch
X-Powered-By: PHP/5.2.6-1+lenny9
Expires: 0
Cache-Control: must-revalidate, post-check=0, pre-check=0, private
Pragma: public
Content-Disposition: attachment; filename="Dai Meiro - Meikyuu no
Tatsujin.7z";
Content-Transfer-Encoding: binary
Content-Length: 101729
Keep-Alive: timeout=15, max=99
Connection: Keep-Alive
Content-Type: application/force-download
----------------------------------------------------------

It seems like the website owner doesn't like hot linking. Retrieving e.g.
<http://website.net/download.php?id=840> directly leads to a notice page.
Letting httrack dig through the site works. Well, kind of. Actually, it
downloads the file. But, instead of naming it "Dai Meiro - Meikyuu no
Tatsujin.7z" it becomes download2c95.php. Now, that's the point where I'm
stuck.

I found several threads concerning such a behaviour, all resulting in
<http://httrack.kauler.com/help/User-defined_structure>
But I'm not able to apply this information to a solution for me.

This thread seems to be similar to my problem:
<http://forum.httrack.com/readmsg/21112/index.html>
But, unfortunately, I couldn't figure out any solution for my case.

Is this
<http://forum.httrack.com/readmsg/21116/21112/index.html>
still the case, causing all my trouble?
Or am I doing something wrong?
 
Reply


All articles

Subject Author Date

03/27/2011 14:28

03/27/2011 16:50

03/27/2011 18:31

03/27/2011 18:35

04/03/2011 14:32




4

Created with FORUM 2.0.11