Hey all!
I am currently scraping web sites with depth three but the pages are available
in several languages. So it is returning all the different pages in all the
different languages.
I would like to only have the English pages. Is there any way I can easily
filter them out while using httrack command line?
The difficulty is that the websites I want to scrape aren't alike. so I can't
use '- */l=fr' because other sites could be written as 'lang=en' instead of
'l=en' or whatever.
Thanks |