| Hi,
> Anyway, here is a list of the callback functions.
I've
> tried to trace them a bit to see what they do. I'm
not
> sure about these descriptions, perhaps Xavier Roche
or
> someone else more familiar with HTTrack could check
> them when time allows. For what they are worth, here
> they are:
Seems to be well documented :)) - here are some
remarks:
First you can look at httrack.c and httrack.h
functions for a working example (httrack.c is only
useful to launch the mirror, and display
some funny information)
The steps to follow to launch the engine are quite
simple, you can do
hts_init();
return hts_main(argc,argv);
,filling argc and argv with proper C-style arguments,
ad if you were calling httrack using the commandline.
You can also define wrappers, so that you can control
and add features when mirroring:
hts_init();
htswrap_add("init",mywrapper_init);
htswrap_add("free",mywrapper_uninit);
htswrap_add("start",mywrapper_start);
htswrap_add("change-options",mywrapper_chopt);
htswrap_add("end",mywrapper_end);
htswrap_add("check-html",mywrapper_checkhtml);
htswrap_add("loop",mywrapper_loop);
htswrap_add("query",mywrapper_query);
htswrap_add("query2",mywrapper_query2);
htswrap_add("query3",mywrapper_query3);
htswrap_add("check-link",mywrapper_check);
htswrap_add("pause",mywrapper_pause);
htswrap_add("save-file",mywrapper_filesave);
htswrap_add("link-
detected",mywrapper_linkdetected);
return hts_main(argc,argv);
In this case, that's right, you'll have to define all
mywrapper_* functions
Here are some inline comments:
> List of callback functions
> --------------------------
>
> Their 'name' is the special identifier which must be
> passed to htswrap_add()
> -----------------------------------------------------
--
> ---------------------
>
> Name: 'change-options'
> Purpose: Called when options have been
> changed by HTTrack
> Params: httrackp* opt
> (all the options for this
session,
> see 'htsopt.h')
> Valid Return: int 1 (ignored as far as I can
> tell, but to be safe use 1)
Some options can NOT be changed, however, such as the
path, proxy..
> Name: 'link-detected'
> Purpose: Called when a link is detected
> Params: char* link (the text of
the 'href='
> attribute)
> links are usually relative, so
this
> text will likely
> be something like 'filename.ext'
> or 'subdir/filename.ext'
The link is given 'as is' - and was extracted from any
tags (a, img, object...) or extracted from javascript -
it can be relative or
absolute (http://)
> Name: 'check-html'
> Purpose: Called to check if an HTML file
> should be parsed after download
> Params: char* buffer_html (address of the
> HTML buffer)
> int buffer_html_size (size of
this
> buffer in bytes)
> char* host_name (eg: www.foo.com)
> char* filename (eg: /index.html)
> Valid Return: int
> 0 = do not parse this file
> 1 = parse this file (default
> behaviour)
> Notes: This function is also called for
> the primary URL,
> before downloading.
> In this case, it is passed the
URL
> in 'buffer_html'
> and the word 'primary'
> in 'host_name'
> and '/primary' in 'filename'.
In this function, you can add some code for linguistic
analysis, search features, and so on..
> Name: 'save-file'
> Purpose: Called when a file is about to be
> saved
> Params: char* filename
> (path to local file, starting
with
> the prefix given
> with the -O command line option;
if
> the given prefix
> was relative then this name will
be
> relative also
> Valid Return: int
> 0 = don't save the file
> 1 = save the file (default
behavior)
Note that you can NOT change the filename in this
routine - I may add wrapper to build specific target
names
> Name: 'loop'
> Purpose: Called during a download loop,
> after every chunk of bytes
> Params: many, many
> Valid Return: int
> 0 = HTTrack should end
> 1 = HTTrack should continue
Here are the parameters or int __cdecl
mywrapper_loop :
lien_back* back,int back_max,int back_index
The 'back' structure (see htscorer.h) is an array
(back_max elements) of lien_back entries, and can be
used to show sharp stats on current downloads.
The "back_index", if non negative,
is the index of the current file being processed
(parsed, or in 'wait' state)
Here are the two main structures used:
- lien_back:
typedef struct {
char url_adr[HTS_URLMAXSIZE*2]; // address of
http document
char url_fil[HTS_URLMAXSIZE*2]; // filename of
http document
char url_sav[HTS_URLMAXSIZE*2]; // local
filename of http document (empty if file not saved)
char referer_adr[HTS_URLMAXSIZE*2]; // address of
HTTP-REFERER, if any
char referer_fil[HTS_URLMAXSIZE*2]; // filename of
HTTP-REFERER, if any
char location_buffer[HTS_URLMAXSIZE*2]; // this
http document sent a redirect to us
char send_too[1024]; // internal - data to send
int status; // status (-1=not used, 0:
ready, >0: operation/download in progesss)
int testmode; // test mode
int timeout; // timeout, in seconds
TStamp timeout_refresh; // internal
int rateout; // minimum transfer rate
TStamp rateout_time; // internal
LLint maxfile_nonhtml; // maximum size of non html
file
LLint maxfile_html; // maximum size of html file
htsblk r; // current link object, see
htsblk structure in htslib.h
short int is_update; // has been updated
int head_request; // head request
LLint range_req_size; // internal
//
int http11; // must use HTTP/1.1
int is_chunk; // internal
char* chunk_adr; // internal
LLint chunk_size; // internal
//
short int* pass2_ptr; // internal
//
char info[256]; // internal
int stop_ftp; // internal
} lien_back;
- htsblk:
typedef struct {
int statuscode; // status-code, -1=error,
200=OK, 201=..etc (see RFC1945)
short int notmodified; // not modified? short int is_write; //
direct-to-disc
short int is_chunk; // internal
char* adr; // address if in memory (!
is_write)
FILE* out; // internal
LLint size; // current downloaded size
char msg[80]; // error message if any
char contenttype[64]; // content-type ("text/html"
for example)
char* location; // internal (redirect)
LLint totalsize; // total size
short int is_file; // this link is a file://
T_SOC soc; // internal
FILE* fp; // internal
char lastmodified[64]; // Last-Modified
char etag[64]; // Etag
char cdispo[256]; // Content-Disposition
(truncated)
/* */
htsrequest req; // internal
} htsblk;
int lien_n,int lien_tot
Links scanned, total number of links
Lint stat_bytes,LLint stat_bytes_recv
Bytes received, bytes received (raw)
int stat_time
Time in seconds
int stat_nsocket
N# of connection
LLint stat_written
Bytes written
int stat_updated, int stat_errors
Files updated, n# of errors
int irate
Current transfer rate, estimated each seconds
int nbk
Links successfully anticipated
> Name: 'pause'
> Purpose: Called to wait for the lock file
to
> be deleted
> Params: char* lockfile
> Valid Return: Nothing
See option G
| |