HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: plugging function for HEAD requests
Author: Allen Day
Date: 05/05/2012 12:03
 
Oh, you're right, it isn't a HEAD.  Rusty me :)

You're right that in the event of a single site crawl/recrawl the md5 check is
no better or worse than a url+size comparison.  The use case I have in mind is
different though -- I expect to be encountering the same image across many
sites (such as many news sites using the same graphic in a story), and I want
to be able to detect the image quickly and not waste bandwidth on downloading
it many times.

The approach you suggest of downloading some bytes then deciding if a
disconnect is appropriate -- is this possible to implement as a plugin, and if
so at which phase?
Thanks!
 
Reply Create subthread


All articles

Subject Author Date
plugging function for HEAD requests

05/04/2012 22:48
Re: plugging function for HEAD requests

05/05/2012 10:30
Re: plugging function for HEAD requests

05/05/2012 12:03




2

Created with FORUM 2.0.11