HTTrack Website Copier
Free software offline browser - FORUM
Subject: Re: Faulty relative URLs when spidering archive.org
Author: Xavier Roche
Date: 06/04/2004 13:20
 
> I'm having problems downloading sites on archive.org.

archive.org sites can not be properly downloaded, because 
the archive site includes javascript code that 
automatically (during onLoad) patches all links embedded in 
the pages.

The only solution is to use the Linux/Unix version, and 
some callbacks using the given example:
src/libtest/callbacks-example-baselinks.c

This plugin will allow you to discard the archive.org's 
BASE HREF's tags and ensure that the links are properly 
formatted.

See my previous post here:
<http://forum.httrack.com/readmsg/7940/index.html?pid=7938&days=10000&js=1&lang=en>

.. but this is not an obvious stuff to do.

 
Reply Create subthread


All articles

Subject Author Date
Faulty relative URLs when spidering archive.org

06/03/2004 16:26
Re: Faulty relative URLs when spidering archive.org

06/04/2004 13:20




9

Created with FORUM 2.0.11