HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Writing to ARC format
Author: Xavier Roche
Date: 11/22/2003 16:10
 
> I didn't know that was a lgeal response code:)

Well, there are several hacks inside httrack that allows to 
crawl even "broken" servers, that is, servers that do not 
give any headers (direct html content) ; in such cases 
you'll have to bypass them (but such cases are fortunately 
rare)

> But you don't do 200 => 30* for 200 responses with a
> Location: header?
No - this is something that is not very common, apparently, 
even if it is not forbidden by the RFC (but 200 + Location 
has no defined meaning anwyay)

> Aha!  I thought that might be the case.  Is there anything
> in the htsblk that's reliable at that point?  And is there
> something I can do to have the headers processed (short of
> doing it myself)?
I have changed the callback position ; and now headers 
should be parsed before (see 3.31-test-1)

> Yup.  Don't have the time to go in and understand it, so I
> just picked the known GLib implementation.  Also, I'm not
> sure what to do to remove an entry from the htsinthash.

This isn't yet possible :)
hashtables in httrack are generally static, or are growing 
(link table, for example)

The 3.31-test-1 is now available in beta release at 
www.httrack.com
 
Reply Create subthread


All articles

Subject Author Date
Re: Writing to ARC format

11/18/2003 20:39
Re: Writing to ARC format

11/19/2003 09:33
Re: Writing to ARC format

11/22/2003 16:10




8

Created with FORUM 2.0.11