HTTrack Website Copier
Free software offline browser - FORUM
Subject: NoIndex, NoFollow
Author: Oliver
Date: 04/09/2009 15:28
 
I have some trouble getting httrack to recognize the robots meta tag. 

From what i understand from the documentation is that the command line tool by
default reads the robots.txt and the appropiate tags.

But still sites are indexed.

I've taken a look at the source code and searched for occurences of nofollow
and noindex. The only hit was in htsparse.c arround line 1245 (3.43.4 source)
and only nofollow was mentioned.

I'm not good at c but as I don't find any mentioning of noindex and my tests
lead to the conclusion that httrack still indexes sites even if NOINDEX is set
in the robots meta-tag. And will just ignore links in a site tagged with
nofollow.


Please tell me that I'm wrong or that I somehow misunderstand the feature.

Best regards
Oliver
 
Reply


All articles

Subject Author Date
NoIndex, NoFollow

04/09/2009 15:28
Re: NoIndex, NoFollow

04/09/2009 16:22
Re: NoIndex, NoFollow

04/14/2009 09:59




8

Created with FORUM 2.0.11