| We've got a large Wiki that's run on a Mac Server using their built-in software
that I'd like to mirror. I've been trying to use httrack and webhttrack on a
Linux box to handle this, but hitting issues.
I've set the website up for public browsing (so no passwords needed) and, I
get get the front page and several random bits, but it's far from complete.
In particular, the wiki structure has a "project" ("minions" in this case) and
within that there are may "pages". These pages are not being pulled.
So, the source of the main project page (that I'm trying to mirror) will have
HTML bits like:
href=\"\/wiki\/pages\/v6c3d4z7B\/Useful_links_and_Random_bits.html\">Useful
links \/ Random bits
Now, I admit, that's a bit odd and that may be why things aren't being found /
followed. The resulting grab from httrack creates the page that has this
link, but never follows it to grab that page.
FWIW here's my call:
httrack starklab.local/wiki/projects/minions/index.html -F "Mozilla/5.0 (X11;
Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/37.0.2062.120
Safari/537.36"
Any thoughts as to how to proceed?
Craig | |