HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Writing to ARC format
Author: Lars Clausen
Date: 11/18/2003 17:21
 
> There are many other useful things in the structures - ask 
> me if necessary.

There's some weirdness about the statuscode field in htsblk
as passed to receive-headers.  It doesn't seem to reflect
the status code found in the headers (and seen by other
parts of HTTrack).  I have as the first statement of my
receive-headers plugin the following:

get_header(char* buff, char* adr, char* fil, char* referer_adr,
	   char* referer_fil, htsblk* incoming) {

  printf("\nget_header %s%s: (%d) %s\n",
	 adr, fil, incoming->statuscode, incoming->location);

Yes when I point this at <http://shasta.cs.uiuc.edu/~lrclause>
 with -r1, I get these printouts:

get_header shasta.cs.uiuc.edu/~lrclause: (-5) 
* shasta.cs.uiuc.edu/~lrclause (270 bytes) - 301
get_header shasta.cs.uiuc.edu/~lrclause/: (0) 
* shasta.cs.uiuc.edu/~lrclause/ (319 bytes) - OK
get_header shasta.cs.uiuc.edu/robots.txt: (-5) 
1/3: shasta.cs.uiuc.edu/robots.txt (279 bytes) - 404
get_header shasta.cs.uiuc.edu/~lrclause/: (-5) 
Done.shasta.cs.uiuc.edu/~lrclause/ (8623 bytes) - OK
Thanks for using HTTrack!

Where are these status codes defined, and shouldn't they be
the code returned by the server?  It's good that I get the
redirects, but I wish I could recognize them in there.

For the curious, I've placed the plugin at
<http://shasta.cs.uiuc.edu/~lrclause/tmp/arc.c>
Note that it depends on GLib, since I had HTTracks own hash
table crash on me.

-Lars
 
Reply Create subthread


All articles

Subject Author Date
Re: Writing to ARC format

11/17/2003 23:01
Re: Writing to ARC format

11/18/2003 14:40
Re: Writing to ARC format

11/18/2003 17:21
Re: Writing to ARC format

11/18/2003 20:27
Re: Writing to ARC format

11/18/2003 20:39




a

Created with FORUM 2.0.11