HTTrack Website Copier
Free software offline browser - FORUM
Subject: Scanned local site deleted after scan
Author: justin
Date: 01/13/2012 00:13
 
Hi,

since a Week, I'm trying to capture a site ( www.diedeutscheindustrie.de )
with a strange issue (this site is a purchasing database of german
companies).

For example the link
www.diedeutscheindustrie.de/fd__-drei-d-konstruktionen_ralf_schmidt/42717.html

Locally, while scanning there are about 19000 fd_* directories created, each
containing a *.tmp file, in this example 42717.html.tmp 
When I view the temp file, this contains the contents of the html file from
that site at the end of the file, containing a german company address. 

After about 18 hours of scanning, the capture is done, with about 600 errors,
the tmp files are gone with no html files and the fd_* (and most other)
directories are empty.

I have tried various settings (ignore robots.txt, do not delete files), this
has not changed the behavior. 
Limiting the download to max 8 sessions reduced the number of errors
drastically.

When I change the scan to save the files/directories in ISO format, then the
program crashes after scanning ca. 2gb (the tmp files in the fd_* directories
still exist).

1) Is there a setting the prevents that these file are deleted after the
scan?
2) Is there a way to convert the temp files to the html files in the ISO
compatibel scan. When doing a rescan this would possibly prevent a redownload
of the respective files.

3) After scan, I can review the error log. Where is this saved locally (to
review with a different utility) ? Viewing it inside httrack (file is
extreamly large) is very slow.

I have htttrack running on Win7 x64.

/j
 
Reply


All articles

Subject Author Date
Scanned local site deleted after scan

01/13/2012 00:13
Re: Scanned local site deleted after scan

01/13/2012 18:26




0

Created with FORUM 2.0.11