| Hi All,
I've executed my crawler script, that contains this httrack 3.33.16 command
line:
nohup httrack \
<http://$PORTAL_HOSTPORT/> \
-w -%P -%q0 -X -b1 -u1 -s0 -%k -p3 -B -a -%H -N0 -%u \
-F "$HTTP_USERAGENT" --http-10 &
After, about 19 hours, 11644 links added to the engine reported in
hts-log.txt, 33619 request/response pairs in hts-ioinfo.txt, httrack process
was stale, and it's requested, about 16,500 contents.
Have you any idea about the reason why it happens?
I think about 3 possible reasons:
- maybe there's some data structure that gets saturated after such a work;
- since I used permanent connections, maybe some network element (firewall,
load balancer, etc.) throws away the connection and httrack process is unable
to recreate them;
- maybe I must define some parameters about: simultaneous connections,
bandwidth limits, connection limits, size limits, time limits or mirror
transfer rate/size?
Thanks
Silvio | |