| > > It looks like httracker followed all of the asp links
> > correctly, downloading all of the files (mirror
depth=4),
> > but did not replace the asp links on most of the pages
> with
> > links pointing to the new html files. Furthermore, it
> > added the server in front of each pointer, whether for
> > style sheets or images
> > (e.g., <http://server.name/css/style.css> instead of
> > leaving the original "css/style.css" - same for
images).
>
> It means that httrack considered these links as "outside
> the default mirror scope" -- that is, unsuitable to be
> downloaded by default.
>
> Check if you have robots.txt limits, and/or use scan
rules
> (Set Options / Scan rules) to widen the default mirorr
> scope (using for example +www.example.com/*)
>
The files were downloaded - but the links to those files
were not altered: e.g., href link to "index.asp?p1=x&p2=y"
remained, although the file to which it actually pointed on
the web was now renamed locally to, say, "index3c86.html?p1=x&p2=y."
I always had "no robots.txt" explicitly set, and had the
server in the scan rules (e.g.,
-*
+www.server.tld*
etc.
Redoing all those links, even with a shell script, will
take forever.... Any ideas to avoid that would be very
much appreciated.
TIA. | |