HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Save website to Database
Author: Abel Deuring
Date: 10/09/2004 14:40
> My question is this - has anyone ever thought about what it
> would take to have httrack save text based documents to a
> database, rather than as files on a hard drive? Does anyone
> have a methodology about how we might go about this or has
> anyone done it?
Well, the most simple way would be to mirror the sites as 
usual and then to copy the files into a database in a
separate step :)) Another option would be to use the 
callback postprocess-html; in this callback you could issue 
an SQL INSERT resp. UPDATE statement. But the post-process 
callback is not used e.g. for images, hence you must 
traverse the directories of the mirror anyway, if you 
really want to store everything in a database. 
Alternatively,  you can record at least the names of all
saved files via the save-name callback.

The main question is what you want to do with the database.

If you want to build a search engine which returns the
original URLs, again the save-name callback is your friend.
It allows you to record how the original URLs are mapped
to file system paths. In the most simple case, you can use
a table with three columns: document data, file path, 
original URL. Use the file path column for updates of the
database; return the original URLs in search requests.
Reply Create subthread

All articles

Subject Author Date
Save website to Database

10/09/2004 00:52
Re: Save website to Database

10/09/2004 14:40
Re: Save website to Database

10/11/2004 18:50


Created with FORUM 2.0.11