Re: jpg and gif files not downloaded - HTTrack Website Copier Forum

Subject: Re: jpg and gif files not downloaded

Author: Xavier Roche

Date: 11/05/2002 19:07

> question1 : what's a robots.txt

See <http://www.robotstxt.org/wc/robots.html>

It is a file designed to tell spiders NOT to go to certain 
site areas (heavy-cpu pages, big files..)

> question2 : why using great care by disabling the 
> robots.txt rule option 

Because generally webmasters are setting restrictions for 
spiders for good reasons (heavy-cpu pages, big files..)

> question3 : what's the filter for downloading files in 
the 
> images and gdx directory

Spider/Spider: no robots.txt rules
but set also in 'Flow Control'/'Maximum number of 
connections' to 1 or 2

Note that certain sites will blacklist you anyway (see 
Haudy Kazemi's remark in older posts)

> 2. Unable to get server's address' (-5) after 2 retries 
at 
> link www.example.com  (from primary/primary)

This error is generally due to a mistyped URL, or a lack of 
proxy

Create subthread

All articles

Subject	Author	Date
jpg and gif files not downloaded		11/05/2002 00:37
Re: jpg and gif files not downloaded		11/05/2002 19:07