| Hi,
since a Week, I'm trying to capture a site ( www.diedeutscheindustrie.de )
with a strange issue (this site is a purchasing database of german
companies).
For example the link
www.diedeutscheindustrie.de/fd__-drei-d-konstruktionen_ralf_schmidt/42717.html
Locally, while scanning there are about 19000 fd_* directories created, each
containing a *.tmp file, in this example 42717.html.tmp
When I view the temp file, this contains the contents of the html file from
that site at the end of the file, containing a german company address.
After about 18 hours of scanning, the capture is done, with about 600 errors,
the tmp files are gone with no html files and the fd_* (and most other)
directories are empty.
I have tried various settings (ignore robots.txt, do not delete files), this
has not changed the behavior.
Limiting the download to max 8 sessions reduced the number of errors
drastically.
When I change the scan to save the files/directories in ISO format, then the
program crashes after scanning ca. 2gb (the tmp files in the fd_* directories
still exist).
1) Is there a setting the prevents that these file are deleted after the
scan?
2) Is there a way to convert the temp files to the html files in the ISO
compatibel scan. When doing a rescan this would possibly prevent a redownload
of the respective files.
3) After scan, I can review the error log. Where is this saved locally (to
review with a different utility) ? Viewing it inside httrack (file is
extreamly large) is very slow.
I have htttrack running on Win7 x64.
/j | |