HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: jpg and gif files not downloaded
Author: Xavier Roche
Date: 11/05/2002 19:07
 
> question1 : what's a robots.txt

See <http://www.robotstxt.org/wc/robots.html>

It is a file designed to tell spiders NOT to go to certain 
site areas (heavy-cpu pages, big files..)

> question2 : why using great care by disabling the 
> robots.txt rule option 

Because generally webmasters are setting restrictions for 
spiders for good reasons (heavy-cpu pages, big files..)

> question3 : what's the filter for downloading files in 
the 
> images and gdx directory

Spider/Spider: no robots.txt rules
but set also in 'Flow Control'/'Maximum number of 
connections' to 1 or 2

Note that certain sites will blacklist you anyway (see 
Haudy Kazemi's remark in older posts)

> 2. Unable to get server's address' (-5) after 2 retries 
at 
> link www.example.com  (from primary/primary)

This error is generally due to a mistyped URL, or a lack of 
proxy 
 
Reply Create subthread


All articles

Subject Author Date
jpg and gif files not downloaded

11/05/2002 00:37
Re: jpg and gif files not downloaded

11/05/2002 19:07




b

Created with FORUM 2.0.11