HTTrack Website Copier
Free software offline browser - FORUM
Subject: PDFs mime type wrong when using cache
Author: Stephan
Date: 08/31/2013 08:40
 
I have a problem that PDFs are saved as .html when I continue or update a
previous mirror. They are saved correctly when I run the mirror for the first
time (or delete the cache).

Details: I'm creating an offline mirror of a literature database running on my
localhost, using wikindx (http://www.wikindx.com/). It serves html pages
through an index.php script with various paramaters, and also pdf files; they
have URLS like the following:
<http://localhost/wikindx/index.php?action=attachments_ATTACHMENTS_CORE&method=downloadAttachment&id=1824&filename=c080b386a440e0927e7ce5eee4c0eeb6debf589d>

Now when I mirror the site with httrack, on the first run the PDF files are
saved correctly with names like index9876.pdf. The HTML files are saved as
.html of course.

But when I then continue an interrupted mirror, or update it, the PDF files
are downloaded again and saved as index9876.html, so I end up with identical
copies .pdf and .html.

As far as I can see, the server sends the correct mime type application/pdf,
and httrack doesn't seem to have problems the first time (wget and firefox
also recognise them as pdfs), so I don't think it's a problem in the server.

I've tried using different httrack options, in particular %D0 and %N0, doesn't
make a difference. With C0 it is correct, but then it downloads all 3Gb of
data every time.

Any ideas what I could try, or is this a bug in the caching?
The links in the downloaded HTML files are also changing (i.e. pointing to the
.pdf at first, then to .html after updates)

Using httrack 3.46 from Ubuntu repos.

 
Reply


All articles

Subject Author Date
PDFs mime type wrong when using cache

08/31/2013 08:40
Re: PDFs mime type wrong when using cache

08/31/2013 14:06
Re: PDFs mime type wrong when using cache

08/31/2013 16:00
Re: PDFs mime type wrong when using cache

09/01/2013 15:59
Re: PDFs mime type wrong when using cache

09/01/2013 21:35
Re: PDFs mime type wrong when using cache

09/18/2013 17:46
Re: PDFs mime type wrong when using cache

05/05/2014 15:23




1

Created with FORUM 2.0.11