| I previously reported that HTTrack insisted on
downloading and treating binary files as HTML.
This included parsing the file (looking for links),
changing the extensions, changing the local index.html
to match the changed filename, and no telling what
else.
This in spite of the original web site's web pages as
being correct. As showing the links as .DLL's, .CAB,
etc. etc. And my being able to download the files
manually with no problems.
And HTTrack correctly showed the link name in its
display. That correct name just didn't make it to the
rest of HTTrack.
I tried several versions of 3.0 and even an old v2.3
of HTTrack.
All three had the same problems with the web site.
(Plus, all three insisted on 'hammering' the web site
with a massive amount of activity and useless link
checking, etc. That often consumed more bandwidth
than the downloads themselves. And often resulted in
the web site's firewall blocking me for several hours.)
I looked around for a better web sucker and I found
several. Some I didn't like, others didn't work, etc.
But I ended up trying GetLeft (on sourceforge) and I
can definetly say it *IS* able to download the web
site correctly. Including all those files that
HTTrack screwed up.
Maybe it's because it's smarter internally, and can
handle a wider array of web sites. Or maybe it's
because it's stupid and just blinding accepts the web
site and data as it is, without trying to second guess
it. It doesn't try and be clever.
Whatever the reason, the fact that it actually *WORKS*
makes it a better web sucker than WinHTTrack.
Plus, as a bonus, it doesn't "hammer" the site with
constant activity. It easily and quickly parses the
index.html, calmly downloads the files and goes on to
the next. None of the constant hammering that HTTrack
does. None of the useless link 'pre-checking', either.
| |