HTTrack Website Copier
Free software offline browser - FORUM
Subject: Download with Disallow in Robots.txt file
Author: Jairo Bernal
Date: 07/12/2004 16:17
 
I have troubles when I try to download the site
<http://www.w3schools.com/>
This site has a Robots.txt file:
==================
User-agent: *
Disallow: /quiztest
Disallow: /banners
Disallow: /images
Disallow: /ado/demo_db_edit.asp
Disallow: /html/tryit.asp
Disallow: /css/tryit.asp
=================

HTTrack reports the next error:

HTTrack3.32-2+swf launched on Mon, 12 Jul 2004 09:06:46 at
<http://www.w3schools.com/> +*.css +*.js -ad.doubleclick.net/*
+*.gif +*.jpg +*.png +*.tif +*.bmp +*.zip +*.tar +*.tgz
+*.gz +*.rar +*.z +*.exe +*.mov +*.mpg +*.mpeg +*.avi +*.asf
+*.mp3 +*.mp2 +*.rm +*.wav +*.vob +*.qt +*.vid +*.ac3 +*.wma
+*.wmv
(winhttrack -qiC1t%Ps1%s%I0p7c32T1200R6H0%kf2A25000%f0#f -F
"Mozilla/4.5 (compatible; HTTrack 3.0x; Windows 98)" -%F 
-%l "es, en, *" <http://www.w3schools.com/> -O
C:\Webs\w3schools,C:\Webs\w3schools +*.css +*.js
-ad.doubleclick.net/* +*.gif +*.jpg +*.png +*.tif +*.bmp
+*.zip +*.tar +*.tgz +*.gz +*.rar +*.z +*.exe +*.mov +*.mpg
+*.mpeg +*.avi +*.asf +*.mp3 +*.mp2 +*.rm +*.wav +*.vob
+*.qt +*.vid +*.ac3 +*.wma +*.wmv -%A
php3,php,php2,asp,jsp,pl,cfm,nsf=text/html )
Information, Warnings and Errors reported for this mirror:
note: the hts-log.txt file, and hts-cache folder, may
contain sensitive information,
 such as username/password authentication for websites
mirrored in this project
 do not share these files/folders if you want these
information to remain private
09:06:47 Info:  engine: transfer-status: link added:
www.w3schools.com/robots.txt -> 
09:06:47 Info:  Note: due to www.w3schools.com remote
robots.txt rules, links begining with these path will be
forbidden: /quiztest, /banners, /images,
/ado/demo_db_edit.asp, /html/tryit.asp, /css/tryit.asp (see
in the options to disable this)
09:06:47 Info:  engine: transfer-status: link added:
www.w3schools.com/ ->
C:/Webs/w3schools/www.w3schools.com/index.html
09:06:47 Info:  Purging
C:/Webs/w3schools/www.w3schools.com/default.html
HTTrack Website Copier/3.32-2 mirror complete in 1 seconds :
2 links scanned, 1 files written (0 bytes overall), no files
updated [539 bytes received at 539 bytes/sec]
(No errors, 0 warnings, 4 messages)

How can I configure HTTrack in order to download this site?Thanks
Jairo Bernal
 
Reply


All articles

Subject Author Date
Download with Disallow in Robots.txt file

07/12/2004 16:17
Re: Download with Disallow in Robots.txt file

07/17/2004 15:53
Re: Download with Disallow in Robots.txt file

08/16/2004 19:04




4

Created with FORUM 2.0.11