HTTrack Website Copier
Free software offline browser - FORUM
Subject: Filtering languages
Author: El
Date: 05/21/2013 17:01
 
Hey all!

I am currently scraping web sites with depth three but the pages are available
in several languages. So it is returning all the different pages in all the
different languages.

I would like to only have the English pages. Is there any way I can easily
filter them out while using httrack command line?
The difficulty is that the websites I want to scrape aren't alike. so I can't
use '- */l=fr' because other sites could be written as 'lang=en' instead of
'l=en' or whatever.

Thanks
 
Reply


All articles

Subject Author Date
Filtering languages

05/21/2013 17:01
Re: Filtering languages

05/21/2013 18:20




3

Created with FORUM 2.0.11