HTTrack Website Copier
Free software offline browser - FORUM
Subject: Content-type from server not being used
Author: john
Date: 06/26/2003 03:43
 
Hello, 

I am having a problem with HTTrack version 3.23 naming 
files as .HTML rather than according to their Content-Type 
(usually .PDF or .DOC). I've read the other previous 
messages on the forum regarding this, but my problem seems 
different.

When downloading a URL using POST, the Content-Type is 
reported incorrectly as "text/html" by the server in 
response to a HEAD request, but when the actual POST 
request is made the server does respond with a correct 
Content-Type type of "application/msword" (for example).

Sadly, it seems HTTrack is only using the response to the 
HEAD request and thus giving the file an .HTML extension. 
Is it possible for HTTrack to use the response from the 
POST instead of the HEAD if Content-Type differs between 
the two?
For example, using this URL:
<http://www.scstatehouse.net/cgi-bin/state_register.exe?>postfile:hts-post0>

Where the hts-post0 file looks like this:
============================
POST /cgi-bin/state_register.exe HTTP/1.1
Accept: image/gif, image/x-xbitmap, image/jpeg, 
image/pjpeg, image/svg+xml, */*
Accept-Charset: iso-8859-1, iso-8859-*, utf-8
Accept-Encoding: gzip, deflate, compress, identity
Accept-Language: en, *
Content-Type: application/x-www-form-urlencoded
User-Agent: Mozilla/5.0 (compatible; Customised HTTP 
Client; Windows NT)
Content-length: 64
Host: www.scstatehouse.net
Connection: Keep-Alive

first=FILE&usercode=mortkas&password=3259&years=2002&file=sr
27-5
====== END OF hts-post0 FILE ======

The hts-ioinfo.txt file looks like this:

===========================
[0] request for www.scstatehouse.net/cgi-
bin/state_register.exe?>postfile:E:/Web%20Sites/test/hts-
post2:
<<< HEAD /cgi-bin/state_register.exe HTTP/1.1
<<< Connection: Keep-Alive
<<< Host: www.scstatehouse.net
<<< User-Agent: Mozilla/4.5 (compatible; MSIE 4.01; Windows 
NT)
<<< Accept: image/gif, image/x-xbitmap, image/jpeg, 
image/pjpeg, image/svg+xml, */*
<<< Accept-Language: en, *
<<< Accept-Charset: iso-8859-1, iso-8859-*, utf-8
<<< Accept-Encoding: gzip, deflate, compress, identity

[0] response for www.scstatehouse.net/cgi-
bin/state_register.exe?>postfile:E:/Web%20Sites/test/hts-
post2:
code=200
>>> HTTP/1.1 200 OK
>>> Date: Thu, 26 Jun 2003 00:18:34 GMT
>>> Server: Apache/2.0.43 (Win32)
>>> Keep-Alive: timeout=15, max=100
>>> Connection: Keep-Alive
>>> Content-Type: text/html; charset=ISO-8859-1

[0] request for www.scstatehouse.net/cgi-
bin/state_register.exe?>postfile:E:/Web%20Sites/test/hts-
post2:
<<< POST /cgi-bin/state_register.exe HTTP/1.1
<<< Accept: image/gif, image/x-xbitmap, image/jpeg, 
image/pjpeg, image/svg+xml, */*
<<< Accept-Charset: iso-8859-1, iso-8859-*, utf-8
<<< Accept-Encoding: gzip, deflate, compress, identity
<<< Accept-Language: en, *
<<< Content-Type: application/x-www-form-urlencoded
<<< User-Agent: Mozilla/5.0 (compatible; Customised HTTP 
Client; Windows NT)
<<< Content-length: 64
<<< Host: www.scstatehouse.net
<<< Connection: Keep-Alive

<<< 
first=FILE&usercode=mortkas&password=3259&years=2002&file=sr
27-5

[0] response for www.scstatehouse.net/cgi-
bin/state_register.exe?>postfile:E:/Web%20Sites/test/hts-
post2:
code=200
>>> HTTP/1.1 200 OK
>>> Date: Thu, 26 Jun 2003 00:18:35 GMT
>>> Server: Apache/2.0.43 (Win32)
>>> Keep-Alive: timeout=15, max=99
>>> Connection: Keep-Alive
>>> Transfer-Encoding: chunked
>>> Content-Type: application/msword

====== END OF hts-ioinfo.txt FILE ======
 
Reply


All articles

Subject Author Date
Content-type from server not being used

06/26/2003 03:43
Re: Content-type from server not being used

06/26/2003 06:53
Re: Content-type from server not being used

06/26/2003 18:55




1

Created with FORUM 2.0.11