Re: Sitemap of Archive - HTTrack Website Copier Forum

Subject: Re: Sitemap of Archive

Author: Xavier Roche

Date: 04/30/2003 21:24

> I wonder if anybody knew a tool for creating a 3D odr 2D
> website-tree out of an HTTrack archive. I want to make a
> visual reprensentation of a website I´ve downloaded for my
> doctoral thesis to study the linking between the pages.

Well.. there is no such feature, and the best way would be 
to use the httrack library (see the lib/ example, and/or 
the httrack.c commandline version which is also a good 
example of use of the library), and to use callbacks:

* The hts_htmlcheck callback ("check-html" callback) is 
called each time a file has to be parsed: you can then note 
in a static variable this link name.

typedef int   (* t_hts_htmlcheck)(char* html,int len,char* 
url_adresse,char* url_fichier);

The "url_adresse" and "url_fichier" are the address and 
filename of the document being processed (example: 
url_adresse=www.foo.com and url_fichier=/index.html)

* The hts_htmlcheck_linkdetected callback ("link-detected" 
callback) will be called each time a link is detected in a 
page/document

typedef int   (* t_hts_htmlcheck_linkdetected)(char* link);

In this callback, you can note somewhere that there is a 
link (char* link) in the document whose link was recorded 
before by hts_htmlcheck_linkdetected

This will require some code, but it should not be too hard 
to build a database with Document -> DocumentChild relation.

The hard work will be the graphic representation, IMHO -- 
and for that I really don't have any idea!

> P.S.: HTTrack is great - I recommend it in my thesis!

Thanks :)

Create subthread

All articles

Subject	Author	Date
Sitemap of Archive		04/30/2003 17:34
Re: Sitemap of Archive		04/30/2003 21:24
Re: Sitemap of Archive		05/01/2003 17:45