| > > A question though (sorry if it's been asked before), I am
> > getting a lot of
> > extra files with names such as indexcaa7.html in the
> > downloaded directories,
>
> These files are query-string URLs ; such as
> <http://www.example.com/index.php?id=1234>. HTTrack always
> handle collisions between URLs having different query
> string values, and generate as many files as there are
> different URLs.
Thanks for your answer but I don't understand it, what is a
query string, and why should there be more files in the
mirror than in the original?
> If multiple URLs are generating the same content (such as
> <http://www.example.com/index.php?foo=<random-number>>;), you
> will end up with multiple copies of these files (there is
> no way to detect such cases before downloading the files
> remotely)
This is perfectly normal, nothing to do with my problem
though, which seems to be related to the spider encountering
several times the _same url_.
To be clearer, the url which gave me the problem is this
directory index url:
<http://fcpe.lattescollege.free.fr/IMG/>
If you run httrack (latest beta) on it, you find 9 spurious
index****.html files in the mirrors of IMG and of its subdirs. | |