| I'm usig HTTPTrack to index a web site using MS Index Server since sep 2006. I
found it very helpful and I dropped a previous spider.
Everything is well but now it happens that pdf files are saved and referred in
the mirror copy using the original file name instead of the actual file name
(the one used to save the document on the web server).
It seems that this problem did not happen before 3.40-2.
I enclose a row from \hts-cache\new.txt that explains the problem:
17:08:37 172630/172630 U----- 200 untouched ('OK') application/pdf
date:Fri,%2020%20Oct%202006%2011:05:20%20GMT
<http://www.comune.aosta.it/download/file/325.pdf>
c:/websites/inva.comuneaosta.index/ComuneAosta/www.comune.aosta.it/download/file/Regolamento%20ICI.pdf
(from
<http://www.comune.aosta.it/it/comune/atti_ufficiali/regolamenti_comunali/>)
This way the reference to the file is wrong because i get (from the search
engine):
<http://www.comune.aosta.it/download/file/Regolamento%20ICI.pdf>
while it should be:
<http://www.comune.aosta.it/download/file/325.pdf>
The same problem happens with other mime types (doc, xls, ...).
May you suggest a circumvention or a fix ?
Thanks a lot. | |