| > question1 : what's a robots.txt
See <http://www.robotstxt.org/wc/robots.html>
It is a file designed to tell spiders NOT to go to certain
site areas (heavy-cpu pages, big files..)
> question2 : why using great care by disabling the
> robots.txt rule option
Because generally webmasters are setting restrictions for
spiders for good reasons (heavy-cpu pages, big files..)
> question3 : what's the filter for downloading files in
the
> images and gdx directory
Spider/Spider: no robots.txt rules
but set also in 'Flow Control'/'Maximum number of
connections' to 1 or 2
Note that certain sites will blacklist you anyway (see
Haudy Kazemi's remark in older posts)
> 2. Unable to get server's address' (-5) after 2 retries
at
> link www.example.com (from primary/primary)
This error is generally due to a mistyped URL, or a lack of
proxy
| |