| > I didn't know that was a lgeal response code:)
Well, there are several hacks inside httrack that allows to
crawl even "broken" servers, that is, servers that do not
give any headers (direct html content) ; in such cases
you'll have to bypass them (but such cases are fortunately
rare)
> But you don't do 200 => 30* for 200 responses with a
> Location: header?
No - this is something that is not very common, apparently,
even if it is not forbidden by the RFC (but 200 + Location
has no defined meaning anwyay)
> Aha! I thought that might be the case. Is there anything
> in the htsblk that's reliable at that point? And is there
> something I can do to have the headers processed (short of
> doing it myself)?
I have changed the callback position ; and now headers
should be parsed before (see 3.31-test-1)
> Yup. Don't have the time to go in and understand it, so I
> just picked the known GLib implementation. Also, I'm not
> sure what to do to remove an entry from the htsinthash.
This isn't yet possible :)
hashtables in httrack are generally static, or are growing
(link table, for example)
The 3.31-test-1 is now available in beta release at
www.httrack.com
| |