HTTrack Website Copier
Free software offline browser - FORUM
Subject: Index.txt
Author: Duncan Branley
Date: 02/05/2004 13:15
Dear Xavier

I'm using the latest version (HTTrack '3.30' (3.30.01)) 
and have been looking at the index.txt correspondence in 
the forum. It looks to be potentially a very useful 
feature - thanks.

Your message introducing the new feature (19/10/2001) 
describes the text file

> The result is a text file, listing all 'relevant' 
> words, the number of hits, the %1000 value, and their 
> position. Example: (after crawling
> able
> 	2
> 	1
> 	1
> 	1
> 	=5
> 	(0)

It works well and I can follow most of the output (eg in 
the above two occurrences of 'able' in the first file and 
a single one in the other three, giving a total of five in 
the mirror), but may I ask:
- what is the "%1000 value"?- what does the "(0) signify?
A slight and probably quite easy modification to the 
generation of this file (or an alternative) would make it 
nicely dovetailed with HTTrack: could there be an option 
within HTTrack to create an html version of index.txt for 
a project so that one could quickly jump to the pages 
containing the words? Perhaps even in two frames? Seeing 
the word in context could be very useful. Seeing it 
highlighted in each page as you scan through the listing 
would be even better!

Any news on a Unicode words.txt (discussed in August '03)?

Reply Create subthread

All articles

Subject Author Date
New feature in test: indexing/linguistic analysis

10/19/2001 14:19

02/05/2004 13:15


Created with FORUM 2.0.11