| I used HTTrack with a website which uses CGI. It happens
that the internal CGI mechanism which generates html files
from the database (placed on the server) - sometimes fails.
It means, you can sometimes see a site with the text "the
dataserver is temporarily busy" instead of the contents you
expected.
It would be fantastic if it existed a simple mechanism to
re-download such files without updating all the HTTrack
Project. "Re-download" means "download again a file within
an existing mirror", never "download a single page". If you
download a single page, links do not work. If the page is a
part of a mirror, links are OK and they lead to other pages
mirrored on my computer.
Notice that any downloaded file is in the cache and in the
project. My idea is as follows:
- instead of restarting whole project, the user just
deletes such a bad file with the "the dataserver is
temporarily busy" contents (the file is seen good by
HTTrack but it is bad from the user's point of view)
- so, the file is now only in the cache and it is not
present in the mirrored project (which is made by HTTrack),
- the user restarts HTTrack with a special option (which
does not exist now, it is only my idea),
- the program checks only files which are not present in
the mirror but present in the cache,
- and it re-downloads them instead of just re-writing from
the cache.
The main aim of such an idea would be saving time. Updating
of a project may last many hours or even days (read my
other messges)! Updating of each single file (not easy to
find because of its complicated name with CGI commands)
would last seconds and not hours.
It is my idea and my favour to ask of the programmers to
apply such an option in future releases of the program.
Note that you can easily find such bad files (for instance,
I am using a text search tool in the folder where the
mirror is saved, and I am searching through mirrored files
for the "database server" text in their entry) and you can
easily delete them. But you have trouble to guess what were
their original names (since it is a CGI based server,
HTTrack changed the names into "responseaf23.html" or
something like this). In other words, I would expect the
option "re-download only files deleted by user, not using
the cache".
Grzegorz JagodziƱski
| |