| > I'am using httrack for gathering files for a search
> engine, but for some of the process I encouter a Warning:
> Cache: damaged cache, trying to repair
Is httrack repairing "most" data ?
> I don't if this problem is caused by the fact that the
> job is starting up after 24 hours and it still busy (I
> don't think because httrack generates a lock file.
Humm, the error is raised when unzOpen() is not able to open a ZIP file -
probably because the central directory was corrupted or missing. But there
might be a bug (such as problem handling very large files ?)
> The command parameters for httrack are:
Seems OK AFAIKS
> - how can I prevent that the cache file will be corrupt?
Err, normally httrack always closes gently the cache at the mirror ending -
ensure that the program did not crash (check the last log entries, that should
be something like "Thanks for using HTTrack!")
> - Is there a limmit on the number of documents or the size of the zip-file?
The maximum size is probably 2GB, but httrack should be able to handle larger
sizes using "dummy central directory entries" (but it has never beed tested)
> - What is better for the performance of httrack: more and smaller caches or
less and larger cache files.
Err, there is only one cache per project. Generally, projects should be "per
site", which is easier to maintain.
| |