Subject: (creating a) problematic sites list
Author: Haudy Kazemi
Date: 05/13/2002 10:14
This is another comment/FYI thing...

I've noticed that there are some problematic/broken 
websites out there that are confusing HTTrack or 
simply make 'clean mirroring' difficult.  Here are 

'Bad' servers like: according to Xavier "This server is just 
crap, with "200" HTML responses to .gif requests, and 
therefore accepting all "gif" files is not a good 
idea :("

Sourceforge and open source software links: beware of 
the CVS sections, and try to keep HTTrack out of the 
webCVS systems.  If you don't, you'll end up copying 
thousands of links you probably don't want.  (The 
source is usually available without going thru the 

Message boards: avoid them with HTTrack...they create 
thousands of links too, many recursive, so a max link 
depth is very important here.

BTW, Xavier how can I check to see what type of 
response any given server gives to a request, as in 
the case of  Can you mention a tool 
(command line on Windows/DOS is fine, a Linux tool if 
that's all you know of.)?

