HTTrack Website Copier
Free software offline browser - FORUM
Subject: HTTrack and Alternatives to wget
Author: Ryan
Date: 04/07/2006 23:33
 
Hello,
  My name is Ryan Miller, and I am currently working for the University of
Illinois.  We are building site-wide web accessibility evaluator:
fae.cita.uiuc.edu.  We have been using wget to download web sites, but keep on
running into problems with it.  We have finally started to explore other
possibilities, and have found HTTrack to be among the more powerful and
flexible tools out there.  I am posting here because it seems like everyone
here is probably quite familiar with some of the issues we have dealt with and
are quite knowledgable about HTTrack, wget, and some of the other options out
there.  

What we need is a tool that supports the following features:

Recursive Download (HTTrack can do)
Preserve Directory Structure (HTTrack can do)
Preserve Filenames (HTTrack can do*)
Is intelligent about MIME Types (HTTrack is pretty good at this)
Can be embedded in a browser (I don't think HTTrack can do this)
Can download pages/sites to strings in memory instead of files on hard disk (I
don't think HTTrack can do this)

Again, HTTrack is able to do most of this stuff pretty well, and I have a
feeling that there isn't a tool out that will do all the stuff we want, but I
feel like it was worth asking for your suggestions.

* A somewhat unrelated issue: It seems like the problem of downloading
multiple files (index.html index-2.html) is something that cannot be solved. 
Does anyone know an elegant way to deal with this problem?  We are developing
a script to filter these duplicates out, but it is getting messy quickly.  

Thanks,
   RYan
 
Reply


All articles

Subject Author Date
HTTrack and Alternatives to wget

04/07/2006 23:33
Re: HTTrack and Alternatives to wget

04/08/2006 16:45
Re: HTTrack and Alternatives to wget

12/27/2014 17:12




e

Created with FORUM 2.0.11