HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Mirroring real estate sites
Author: William Roeder
Date: 11/30/2009 19:42
 
> These sites are huge - tens of thousands of house
> and condo listings.
> 
> Question 1) Can HTTrack realistically mirror such a
> huge site?Yes. My mirror update (mostly html files only):
HTTrack Website Copier/3.43-7 mirror complete in 7 hours 23 minutes 6 seconds:
48393 links scanned, 48220 files written (29231434473 bytes overall), 14712
files updated [308456528 bytes received at 11602 bytes/sec], 56196 bytes
transfered using HTTP compression in 4 files, ratio 99%, 1.0 requests per
connection.

Over 100K files you need to set options -> limits -> Maximum number of links

> Question 2) How do I estimate how big the mirrored
> site will be?  Will I need to get a TB-size external
> drive?The size of the mirrored side = size of the site + 10%
Depends on what the site is, YouTube is videos assume MB per file. A bbs is
mostly text, assume 50kb per file.

> Question 3) I have a mobile broadband connection
> that averages 6 Mbits/s.  How long will uploading
> one of these sites take?You mean downloading. 6MBS=600KB/s or 2GB/hour
ASSUMING the site can sustain that. Most can't. Also the Security limits
prevents more than 4 connections 100KB/s (That can be overridden)

> Question 4) How do I check if robot.txt will stop me
> from mirroring?type SITE/robots.txt into your browser.

 
Reply Create subthread


All articles

Subject Author Date
Mirroring real estate sites

11/30/2009 16:08
Re: Mirroring real estate sites

11/30/2009 19:42




6

Created with FORUM 2.0.11