HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Words Database algorithm
Author: Xavier Roche
Date: 04/01/2007 10:07
> Could someone tell how the keywords are selected
> from the html  page? E.g. words within script, href 
> html tags are ignored,... I would like to know the
> details of the parsing algorithm used.

The algorithm is really basic ; ie. it only selects words and merge/sort them
in the final phase. It should probably be improved one day.
Reply Create subthread

All articles

Subject Author Date
Words Database algorithm

03/30/2007 03:41
Re: Words Database algorithm

04/01/2007 10:07
Re: Words Database algorithm

04/09/2007 06:33


Created with FORUM 2.0.11