| > Thanks very much for
> your patience, advice, and programming.
Thanks :p
> We remain vexed, however, at managing the HTTrack
> jobs for hundreds (soon to be over 1,000) sites that
> have been contributed to the eGranary.
Two options, IMHO:
1. On Linux/Unix, a frontend script which will do something like:
#!/bin/sh
#
(some SQL action)
httrack $*
RETURNCODE=$?(some SQL action)
2. On all platforms, use of the library -- either using plugins (the plugin
library have been simplified greatly) to track start/end of mirrors and/or
completeness
3. On all platforms too, modifying src/httrack.c (which is actually only a
frontend to the httrack core library!) to fit your needs
Note that the current 3.41 beta release **should** be thread-safe, and hence
you **should** be able to spawn multiple mirrors in multiple threads.
> Here's our biggest problem: keeping tabs on which
> jobs are running on which machines (we have 20
> dedicated to scraping and updating our mirrors) and
> knowing when the jobs are done.
A simple script frontend would do the trick ?
> It would be much better if we had some option inside
> HTTrack that would "send a signal" to some central
> handler that could then process the data.
Might also be done using the 3.41 library (in the following example, just
modify end_of_mirror):
/usr/share/httrack/libtest/callbacks-example-simple.c:
----------------------------------------
/* system includes */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
/* standard httrack module includes */
#include "httrack-library.h"
#include "htsopt.h"
#include "htsdefines.h"
/* external functions */
EXTERNAL_FUNCTION int hts_plug(httrackp *opt, const char* argv);
EXTERNAL_FUNCTION int hts_unplug(httrackp *opt);
/* local function called as "check_html" callback */
static int process_file(t_hts_callbackarg /*the carg structure, holding
various information*/*carg, /*the option settings*/httrackp *opt,
/*other parameters are callback-specific*/
char* html, int len, const char* url_address, const
char* url_file) {
void *ourDummyArg = (void*) CALLBACKARG_USERDEF(carg); /*optional
user-defined arg*/
/* call parent functions if multiple callbacks are chained. you can skip
this part, if you don't want previous callbacks to be called. */
if (CALLBACKARG_PREV_FUN(carg, check_html) != NULL) {
if (!CALLBACKARG_PREV_FUN(carg, check_html)(CALLBACKARG_PREV_CARG(carg),
opt,
html, len, url_address,
url_file)) {
return 0; /* abort */
}
}
printf("file %s%s content: %s\n", url_address, url_file, html);
return 1; /* success */
}
/* local function called as "end" callback */
static int end_of_mirror(t_hts_callbackarg /*the carg structure, holding
various information*/*carg, /*the option settings*/httrackp *opt) {
void *ourDummyArg = (void*) CALLBACKARG_USERDEF(carg); /*optional
user-defined arg*/
/* processing */
fprintf(stderr, "That's all, folks!\n");
/* call parent functions if multiple callbacks are chained. you can skip
this part, if you don't want previous callbacks to be called. */
if (CALLBACKARG_PREV_FUN(carg, end) != NULL) {
/* status is ok on our side, return other callabck's status */
return CALLBACKARG_PREV_FUN(carg, end)(CALLBACKARG_PREV_CARG(carg), opt);
}
return 1; /* success */
}
/*
module entry point
the function name and prototype MUST match this prototype
*/
EXTERNAL_FUNCTION int hts_plug(httrackp *opt, const char* argv) {
/* optional argument passed in the commandline we won't be using here */
const char *arg = strchr(argv, ',');
if (arg != NULL)
arg++;
/* plug callback functions */
CHAIN_FUNCTION(opt, check_html, process_file, /*optional user-defined
arg*/NULL);
CHAIN_FUNCTION(opt, end, end_of_mirror, /*optional user-defined arg*/NULL);
return 1; /* success */
}
/*
module exit point
the function name and prototype MUST match this prototype
*/
EXTERNAL_FUNCTION int hts_unplug(httrackp *opt) {
fprintf(stderr, "Module unplugged");
return 1; /* success */
}
----------------------------------------
> -- an email containing either the data or the
> location of the hts-log sent to a user-configurable
> address
Or the digest of the hts-log.txt ?
> -- the capacity to post the data or the location of
> the hts-log into and ODBC database
mysql-client ?
| |