| > > First I tried using the command line options on
> WinHTTrack
>
> Ah, this is not possible (and this should be in the
> documentation, yes, I know..), the GUI is not very
clever,
> and is just a frontend to the libhttrack.dll engine.
Ah, I understand. Shame (it means we need two executables
in the distro') but at least there's a logical reason...
> > I deliberately leave off the actual web address, hoping
> > that HTTrack will take all relevant settings from
> the .whtt file.
> Not the .whtt file, but the doit.log file generated by
the
> engine AFTER you click 'Start' using the GUI.
I see. This sounds like I should take that as a "yes, it's
quite proper to run httrack without specifying a source and
it should work as expected."
> > Unfortunately, it prints this message, and then says
> Done!
> > and stops. There are over 900 links in detail.asp
> Ahah. But did you tell httrack that it had to crawl ALL
> sites in this page?No: all the links (except the few in the "menu" block)
are "local" documents. The web page links to a set of
scientific spec. and safety doc's for a range of products
from a company often used by my colleagues. The vast
majority of these 900 links are of the form
HREF="specs/x.pdf" or HREF="safety/x.pdf"
This is why I kind-a baffled... If I run it from the GUI,
it diligently displays all the URLs it's checking. IIRC,
the first page -- detail.asp which I believe is database
generated -- is the only page to be updated (since it
obviously can't have a revision date associated with it),
but it takes a couple of minutes to run through all the
other links to see that nothing else has changed.
I am using a proxy. Without specifying any options, will
the CLI HTTrack use the same proxy options as the GUI
version? I've only just thought of this as a potential
issue -- Is this the obvious problem? Am I being thick?
Or does this also not matter? | |