HTTrack Website Copier
Free software offline browser - FORUM
Subject: website aliases workaround
Author: Michael
Date: 11/09/2004 11:57
 
dear folks,

recently i tried to mirror a site that uses several aliases
like www1.site.com, www2.site.com, etc.
probably you know that it is not a good thing when you try
to update the site later on. yesterday httrack downloaded
www1.site.com/file.gif and today the url is
www4.site.com/file.gif and suddenly you have the file twice.
this is of course not what you wanted.
as far as i can see the support for sites like that is not
yet implemented in httrack.

so, please let me introduce a workaround for that. probably
this was mentioned before. if so, please excuse me wasting
your time :-). the workaround includes a programme called
proxomitron which basically is a proxy that you use which
gets rid of all the evil things in webpages, evil
javascript, evil html etc.

1. step: get it and install it (there is no real
installation just extract the archive)
2. step: run the programme
3. step: deactive all the filters for now (unless you want
some stuff filtered. beware: sometimes proxomitron messes up
a websites when they heavily depend on javascript to work).
4. step: add your own webpage-filter:
matching expression:
www[0-9].foobar.com
replacement text:
www.foobar.com
5. step: now run httrack on the site as before but tell it
to use proxomitron as its proxy.
6. step: enjoy.

using this, all occurances of wwwX.foobar.com (X is a number
from 0 to 9) will be replaced by just www.foobar.com before
httrack sees it.
if you know a little about regular expression and read some
of the proxomitron docs you can (of course) do more
sophisticated stuff with it.

if you have any questions feel free to email me.
 
Reply


All articles

Subject Author Date
website aliases workaround

11/09/2004 11:57
Re: website aliases workaround

11/09/2004 12:05
Proxomitron also gives an error

11/12/2004 12:21




d

Created with FORUM 2.0.11