HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Extra files ( indexXXXX.html )
Author: Michel Jullian
Date: 12/11/2004 20:59
 
> > A question though (sorry if it's been asked before), I am
> > getting a lot of 
> > extra files with names such as indexcaa7.html in the
> > downloaded directories, 
> 
> These files are query-string URLs ; such as 
> <http://www.example.com/index.php?id=1234>. HTTrack always 
> handle collisions between URLs having different query 
> string values, and generate as many files as there are 
> different URLs.

Thanks for your answer but I don't understand it, what is a
query string, and why should there be more files in the
mirror than in the original?
> If multiple URLs are generating the same content (such as 
> <http://www.example.com/index.php?foo=<random-number>>;), you 
> will end up with multiple copies of these files (there is 
> no way to detect such cases before downloading the files 
> remotely)

This is perfectly normal, nothing to do with my problem
though, which seems to be related to the spider encountering
several times the _same url_.

To be clearer, the url which gave me the problem is this
directory index url:
<http://fcpe.lattescollege.free.fr/IMG/>
If you run httrack (latest beta) on it, you find 9 spurious
index****.html files in the mirrors of IMG and of its subdirs.
 
Reply Create subthread


All articles

Subject Author Date
Extra files ( indexXXXX.html )

12/11/2004 12:45
Re: Extra files ( indexXXXX.html )

12/11/2004 15:09
Re: Extra files ( indexXXXX.html )

12/11/2004 20:59
Re: Extra files ( indexXXXX.html )

12/11/2004 21:14
Re: Extra files ( indexXXXX.html )

12/11/2004 21:35




d

Created with FORUM 2.0.11