| > Hi All,
> I've executed my crawler script, that contains this
> httrack 3.33.16 command line:
>
> nohup httrack \n> <http://$PORTAL_HOSTPORT/> \n> -w -%P -%q0 -X -b1 -u1 -s0
-%k -p3 -B -a -%H -N0 -%u
> \n> -F "$HTTP_USERAGENT" --http-10 &
>
> After, about 19 hours, 11644 links added to the
> engine reported in hts-log.txt, 33619
> request/response pairs in hts-ioinfo.txt, httrack
> process was stale, and it's requested, about 16,500
> contents.
> Have you any idea about the reason why it happens?>
> I think about 3 possible reasons:
> - maybe there's some data structure that gets
> saturated after such a work;
> - since I used permanent connections, maybe some
> network element (firewall, load balancer, etc.)
> throws away the connection and httrack process is
> unable to recreate them;
> - maybe I must define some parameters about:
> simultaneous connections, bandwidth limits,
> connection limits, size limits, time limits or
> mirror transfer rate/size?>
> Thanks
>
>
> Silvio
Random thoughts:
1. Try using
<http://download.httrack.com/cserv.php3?File=httrack-beta.exe>
<http://www.httrack.com/page/2/en/index.html>
2. Define minimum values for:
simultaneous connections, bandwidth limits,
connection limits, size limits, time limits or
mirror transfer rate/size
3. Ensure more(x3) free memory & disk space than required.
4. Run surface scan for your disks (OS,Swap,Mirror);
5. Install patches for your OS.
| |