| I want to use HTTrack to spider a Web site to flat files, then use a search
indexer to consume that set of files.
However, I need a way for the indexer to know the corresponding URL of the
flat file it consumes. So, when the indexer sucks up "a_page.html", it needs
to know that the actual URL to that page on the Internet is
<http://mysite.com/directory/a_page.php?id=7>.
First, is there an easy way to do this? Does HTTrack log the URL of the page
anywhere in the file it produces?
Can HTTtrack embed META tags when it spiders? If I could have it drop the URL
it indexed into a META tag on the resulting file, then I can use that in the
search results to find my way to the page URL.
Possible? | |