| I tried HTTrack, it is a great tool to mirror a web site.
Now I want to use HTTrack to setup a test web server that
keeps instances of some URLs, the requirement is when the
tester visits a test URL (i.e. www.msn.com) through test
web server, this test web server should provide all
components (www.msn.com/index.html and all embedded
components associated with www.msn.com/index.html) just
like he visits same URL through original web server. It is
not necessary to download related links (i.e. Autos
Careers) because we are only interested with the
performance to download this page. The reason to setup
this test bed is to get controlled environment to test
compression servers. Because the contents in test web
server are static and the connection between compression
servers and test server is perfect, we can make sure the
difference of test results is only contributed by
different compression servers instead of changed contents
and network environment.
I tried using HTTrack -w URL, it works fine for some sites
like www.netzero.net, but it does not work well for some
popular sites like sports.yahoo.com, www.ebay.com and
www.amazon.com. For some sites, contents of test web
server downloaded by HTTrack can't provides all required
components (I used TCPDUMP to monitor all download
traffic) therefore the browser has to visit original server
(s) to download some embedded components that violates the
design of the test bed.
For www.msn.com <http://www.msn.com>, it seems that I can
download all components but the appearance in browser when
visiting through the test server is different from when
visiting through original www.msn.com <http://www.msn.com>
server that means the browser may get different objects
between the test server and original web server.
I run HTTrack on Linux 2.4.24; the web server is Apache
2.0.40 . Client machine is Windows XP home edition with IE
6.0.
Could you please give me some suggestions to use your
HTTrack to setup the test bed?
Thanks
Fan
| |