| > I have a site that I'm trying to get some pages of.
> Unfortunately the image tags are not refering directly to
> the image but rather seem to have to round trip to the
> database for the image information. When I 'Save Image
As'
> within the browser the image is called 'image' and
appears
> to be a .jpg file.
>
> Here is an example of the image tags.
> <img
>
src=http://shop.magicmomentsgiftware.com.au:8080/catalogue/
> category117/product838/image/?> size=300x300&helper=1049354555.01'
width='283'
height='278'
> border='0'>
>
> So how can I get the images to be copied correctly?
Humm. First problem: the filename ends with "/", which is
generally used for top index in folders. httrack then
assume that the link is an html file.
You have to disable this behaviour, using
'Set options' / 'Spider' / 'Check document type' to 'If
unknown'
But this is a rather strange idea.
The second problem is that the CGI behing is dumb and is
not able to answer properly to "HEAD" requests, which is
totally incompatible with RFC2616. httrack is using head
requests to detect the document type before naming it, so
this is a problem:
HEAD
/catalogue/category117/product838/image/?size=300x300&helper=1049354555.01
HTTP/1.0
Host: shop.magicmomentsgiftware.com.au:8080
HTTP/1.0 200 OK
Server: Zope/(Zope 2.4.3 (source release, python 2.1,
linux2), python 2.1.1, linux2) ZServer/1.1b1
Date: Thu, 01 May 2003 15:25:06 GMT
Ms-Author-Via: DAV
Content-Type: application/octet-stream
Accept-Ranges: none
Connection: close
Etag: ts51802647.5
Content-Length: 151013
Last-Modified: Thu, 01 May 2003 15:24:07 GMT
For that, there is fortunetaly a hack:
'Set options' / 'Spider' / 'Force old HTTP/1.0 requests'
With these two settings adjusted, it should now work
properly.
| |