HTTrack Website Copier
Free software offline browser - FORUM
Subject: problems with yahoo
Author: hollow.quincy
Date: 12/04/2011 23:31
Hi, I would like to crawl Yahoo portal, so I use command:

httrack <> -O "/home/user/HTTRACK/yahoo" "**"
-s0 -r10
-s0 - means do not respect robots.txt
-r10 - depth 10
Alter some second I have log like that: launched on Sun, 06 Nov 2011 16:15:53 at
<> **
(httrack <> -O /home/marek/HTTRACK/yahoo ** -s0
Information, Warnings and Errors reported for this mirror:
note:    the hts-log.txt file, and hts-cache folder, may contain sensitive
    such as username/password authentication for websites mirrored in this
    do not share these files/folders if you want these information to remain

16:15:54    Error:     "Unable to get server's address: No such file or
directory" (-5) after 2 retries at link ** (from primary/primary)

HTTrack Website Copier/3.43-9 mirror complete in 1 seconds : 4 links scanned,
1 files written (78 bytes overall) [686 bytes received at 686 bytes/sec], 78
bytes transfered using HTTP compression in 1 files, ratio 132%
(1 errors, 0 warnings, 0 messages)

I think this is because redirections.. What should I do to crawl _only_ Yahoo
web page ? (I shouldn't use filter: "*yahoo*" because yahoo word can be in get
parameter for example).

Thank you for help

All articles

Subject Author Date
problems with yahoo

12/04/2011 23:31
Re: problems with yahoo

12/05/2011 17:22
Re: problems with yahoo

12/09/2011 21:20


Created with FORUM 2.0.11