| > I'm getting tens of thousands of duplicate images
> because the website has chosen for some reason (log
> analysis maybe?) to include a query string on their
> images, i.e.:
>
> example.com/foo.gif?x=000000
> example.com/foo.gif?x=000001
>
> Is it possible to generically say that if an image
> file has a parameter that you only download one? It
> seems like this would need to be a heuristic
> optimization that would need to be built into
> HTTrack...
The problem is that there is no way to tell whether foo.gif?x=0 is or is not
the same as foo.gif?x=1 without downloading them. | |