| I have a problem that PDFs are saved as .html when I continue or update a
previous mirror. They are saved correctly when I run the mirror for the first
time (or delete the cache).
Details: I'm creating an offline mirror of a literature database running on my
localhost, using wikindx (http://www.wikindx.com/). It serves html pages
through an index.php script with various paramaters, and also pdf files; they
have URLS like the following:
<http://localhost/wikindx/index.php?action=attachments_ATTACHMENTS_CORE&method=downloadAttachment&id=1824&filename=c080b386a440e0927e7ce5eee4c0eeb6debf589d>
Now when I mirror the site with httrack, on the first run the PDF files are
saved correctly with names like index9876.pdf. The HTML files are saved as
.html of course.
But when I then continue an interrupted mirror, or update it, the PDF files
are downloaded again and saved as index9876.html, so I end up with identical
copies .pdf and .html.
As far as I can see, the server sends the correct mime type application/pdf,
and httrack doesn't seem to have problems the first time (wget and firefox
also recognise them as pdfs), so I don't think it's a problem in the server.
I've tried using different httrack options, in particular %D0 and %N0, doesn't
make a difference. With C0 it is correct, but then it downloads all 3Gb of
data every time.
Any ideas what I could try, or is this a bug in the caching?
The links in the downloaded HTML files are also changing (i.e. pointing to the
.pdf at first, then to .html after updates)
Using httrack 3.46 from Ubuntu repos.
| |