| I am completing a Masters degree in Linguistics and wish to create a corpus
from all the text on a company's website. The website runs to over 300 pages
and so what I need is to be able to scrape all the text from each page -
anything between the <body></body> tags is fine - from every page and dump it
all into a single .txt file or similar.
Can HTTrack do this and, if so, how could I achieve this?
Your help is greatly appreciated.
Thanks
Jay | |