Re: How do we download all links even links with no ex

Subject: Re: How do we download all links even links with no ex

Author: William Roeder

Date: 09/22/2009 01:57

> I can't download links with no extension. Do we have
> an argunments to dowload Absolutely all LINKS with
> any kind of extension for a given domain?Extensions are irrelevent on web
sites. All that matters are the url and returned mime type.

> I am using this command to download all links from
> this domain fantasy.premierleague.com/:
> httrack -%M -q -r3 -N4 -s0 -p*3
> <http://fantasy.premierleague.com/>

-p*3 is invalid -p3 is the default.

Robot.txt says:
User-agent: *
Disallow: /M/
So most of the links won't be followed.
-sN follow robots.txt and meta robots tags (0=never,1=sometimes,* 2=always)
(--robots[=N])
 

> but it does not download links such as :
> <http://www.premierleague.com/page/blackburn-rovers>
> This is not a folder but a link !
The links is external to fantasy.premierleague.com and by default httrack only
stays on site.
Either add an external depth limit:
-%eN set the external links depth to N (* %e0) (--ext-depth[=N])
Or override with a filter +*.premierleague.com/*

Create subthread

All articles

Subject	Author	Date
How do we download all links even links with no ex		09/21/2009 11:49
Re: How do we download all links even links with no ex		09/22/2009 01:57
Re: How do we download all links even links with no ex		09/22/2009 16:17