Re: HTTrack Crashed Our Server Last Night - HTTrack Website Copier Forum

Subject: Re: HTTrack Crashed Our Server Last Night

Author: Xavier Roche

Date: 05/30/2005 20:56

> Glad to hear that your tool will enforce a hard limit for
> resources used.  I wish you can keep your words.

I always keep my words. It will be merged for the next release (4 simultaneous
connections, 5 connections/seconds, and a maximum of 100K/s)

BUT an option will still be available for experts (such as researchers
building web archives, administrators that want to make load tests, and
authorized people) to bypass these limits. This option will not be available
through the GUI - so that regular users (who don't need it) will not be
tempted to "click it". The option will be explicitely documented as 'extremly
dangerous', so that users are aware of what they are doing.

>     I have downloaded HTTrack and tried it.  I was
> frightened that it can disable respecting robots.txt
> rules

Yes, because offline browsers are not robots, but softwares between browsers
and robots, depending on their usage. In large-scale mirrors, they can be
considered as crawlers. For small amount of pages, it's nothing more than a
browser.

> setting the User-Agent string

Also mandatory as many servers will deliver different content according to the
User-Agent (IE, Mozilla.. there are servers that won't deliver anything is the
User-Agent string don't match a known Internet Explorer)

But, again, the default User-Agent is clearly identified as being HTTrack.

> It doesn't work for "Disallow: /" (disallow everything)

It will also be fixed (followed by default).

> Those are not polite for a spider, too.

HTTrack is not a spider.

> Not all contents are GPL or GDL.

But you can still copy them for your own use.

> No, there is have no option for simutanuous connections.

But no default bandwith limit. HTTrack already has one. And no delay (even
small one) between connections AFAIKS.

Create subthread

All articles

Subject	Author	Date
HTTrack Crashed Our Server Last Night		05/30/2005 07:49
Re: HTTrack Crashed Our Server Last Night		05/30/2005 08:03
Re: HTTrack Crashed Our Server Last Night		05/30/2005 12:41
Re: HTTrack Crashed Our Server Last Night		05/30/2005 16:41
Re: HTTrack Crashed Our Server Last Night		05/30/2005 17:32
Re: HTTrack Crashed Our Server Last Night		05/30/2005 17:46
Re: HTTrack Crashed Our Server Last Night		05/30/2005 17:50
Re: HTTrack Crashed Our Server Last Night		05/30/2005 17:55
Re: HTTrack Crashed Our Server Last Night		05/30/2005 18:00
Re: HTTrack Crashed Our Server Last Night		05/30/2005 20:56
Re: HTTrack Crashed Our Server Last Night		05/30/2005 21:20
Re: HTTrack Crashed Our Server Last Night		05/31/2005 08:51
Re: HTTrack Crashed Our Server Last Night		05/31/2005 11:37
Re: HTTrack Crashed Our Server Last Night		05/31/2005 12:52
Re: HTTrack Crashed Our Server Last Night		06/02/2005 10:46
Re: to imacat : solution found for u		06/02/2005 10:50
Re: HTTrack Crashed Our Server Last Night		06/02/2005 16:07
My does MySql crash?		06/06/2005 14:59
Re: My does MySql crash?		06/06/2005 17:41
Re: HTTrack Crashed Our Server Last Night		06/09/2005 13:02
Re: HTTrack Crashed Our Server Last Night		06/13/2005 21:57
Re: HTTrack Crashed Our Server Last Night		10/02/2006 10:51