HTTrack Website Copier
Free software offline browser - FORUM
Subject: Bad CGI-based pages
Author: Grzegorz
Date: 12/04/2003 10:37
 
I used HTTrack with a website which uses CGI. It happens 
that the internal CGI mechanism which generates html files 
from the database (placed on the server) - sometimes fails. 
It means, you can sometimes see a site with the text "the 
dataserver is temporarily busy" instead of the contents you 
expected.
 
It would be fantastic if it existed a simple mechanism to 
re-download such files without updating all the HTTrack 
Project. "Re-download" means "download again a file within 
an existing mirror", never "download a single page". If you 
download a single page, links do not work. If the page is a 
part of a mirror, links are OK and they lead to other pages 
mirrored on my computer.

Notice that any downloaded file is in the cache and in the 
project. My idea is as follows:
- instead of restarting whole project, the user just 
deletes such a bad file with the "the dataserver is 
temporarily busy" contents (the file is seen good by 
HTTrack but it is bad from the user's point of view)
- so, the file is now only in the cache and it is not 
present in the mirrored project (which is made by HTTrack),
- the user restarts HTTrack with a special option (which 
does not exist now, it is only my idea),
- the program checks only files which are not present in 
the mirror but present in the cache,
- and it re-downloads them instead of just re-writing from 
the cache.
 
The main aim of such an idea would be saving time. Updating 
of a project may last many hours or even days (read my 
other messges)! Updating of each single file (not easy to 
find because of its complicated name with CGI commands) 
would last seconds and not hours.
 
It is my idea and my favour to ask of the programmers to 
apply such an option in future releases of the program. 
Note that you can easily find such bad files (for instance, 
I am using a text search tool in the folder where the 
mirror is saved, and I am searching through mirrored files 
for the "database server" text in their entry) and you can 
easily delete them. But you have trouble to guess what were 
their original names (since it is a CGI based server, 
HTTrack changed the names into "responseaf23.html" or 
something like this). In other words, I would expect the 
option "re-download only files deleted by user, not using 
the cache".

Grzegorz JagodziƱski
 
Reply


All articles

Subject Author Date
Bad CGI-based pages

12/04/2003 10:37
Re: Bad CGI-based pages

12/04/2003 19:55




a

Created with FORUM 2.0.11