| Dear Xavier
I'm using the latest version (HTTrack '3.30' (3.30.01))
and have been looking at the index.txt correspondence in
the forum. It looks to be potentially a very useful
feature - thanks.
Your message introducing the new feature (19/10/2001)
describes the text file
> The result is a text file, listing all 'relevant'
> words, the number of hits, the %1000 value, and their
> position. Example: (after crawling www.httrack.com)
>
> able
> 2 www.httrack.com/HelpHtml/fcguide.html
> 1 www.httrack.com/HelpHtml/abuse.html
> 1 www.httrack.com/HelpHtml/dev.html
> 1 www.httrack.com/HelpHtml/step9_opt9.html
> =5
> (0)
It works well and I can follow most of the output (eg in
the above two occurrences of 'able' in the first file and
a single one in the other three, giving a total of five in
the mirror), but may I ask:
- what is the "%1000 value"?- what does the "(0) signify?
A slight and probably quite easy modification to the
generation of this file (or an alternative) would make it
nicely dovetailed with HTTrack: could there be an option
within HTTrack to create an html version of index.txt for
a project so that one could quickly jump to the pages
containing the words? Perhaps even in two frames? Seeing
the word in context could be very useful. Seeing it
highlighted in each page as you scan through the listing
would be even better!
Any news on a Unicode words.txt (discussed in August '03)?
Thanks,
Duncan | |