HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: problems with yahoo
Author: Hollow.Quincy
Date: 12/09/2011 21:20
 
> You didn't use a filter. You used TWO urls (www.yahoo.com and *yahoo.com/*)
> The later is not a valid url.
> Try "+*.yahoo.com/*"

I try your trick, but I still have problems. My commands is:

httrack <http://www.yahoo.com/index.html> -O "/home/HTTRACK/yahoo"
+*.yahoo.com/* -* +mime:text/html -s0 -r10 -M100000000 -E600 -%l en -F
"Mozilla/5.0 (Windows NT 6.0; WOW64; rv:8.0.1) Gecko/20100101 Firefox/8.0.1"

Options:
<http://www.yahoo.com/index.html> - this is first problem, when i use just
<http://www.yahoo.com/> I have only one page and this is the end.. (but not
every page has index.html.. or should I add it to every domain I want to
crawl?)
+*.yahoo.com/*   -filter only pages from yahoo website,
-* +mime:text/html    -I want to download only html pages (not images and
others),
-s0  -don't worry about robots.txt
-r10  -crawl very deep
-M100000000  -download 100Megabytes
-E600   -httack can download only 10 hours
-%l en   -I would like to download english pages first (should I use quotation
marks " ?-F "Mozilla/5.0 (Windows NT 6.0; WOW64; rv:8.0.1) Gecko/20100101
Firefox/8.0.1"   -say website that browser is Firefox. 

How should look my httrack command ? 

Thank you for help !
 
Reply Create subthread


All articles

Subject Author Date
problems with yahoo

12/04/2011 23:31
Re: problems with yahoo

12/05/2011 17:22
Re: problems with yahoo

12/09/2011 21:20




3

Created with FORUM 2.0.11