| > Neither this solution (using --near) nor the solution
> suggested by the Xavier Roche (using -*
+*.gif+*.jpg...etc)
> works.
>
> All I want is that downloading <http://www.google.com>
should
> get me:
>
> index.html
> images/logo.gif
>
> That's it. But no combination of flags I use in httrack
(on
> Windows or Linux!) seems to get me that.
>
> The --near seemed very promising, but doesn't seem to do
the
> trick.
>
> What am I missing? Can someone try it out with the
> extremely simple requirement above in mind and tell me
what
> the magic incantation is?>
> Best regards,
>
> Sitaram
Well, I just tried getting www.google.com with WinHTtrack
3.23 (beta), let it go for about a minute, and it easily
crawled 40+ pages with images, including that logo.gif
mentioned above. However I noticed these problems (some
of which look like bugs to me...):
1.) you may have needed to be more general about the
domain (use google.com instead of www.google.com) however
the following results mean there is more at play...
2.) even when I included the more generic google.com on
the project sites list, the local copy had problems with
the links right above the search box (Images, Groups,
Directory, News-New!) When browsing the local copy,
trying to access these links resulted in going to the web
copy of the pages, however when I looked at the saved html
pages, they were actually saved and ready to use. The
link you saw on mouseover, and you got by a right-click
Copy Shortcut was correct, but not the link you got when
it was actually clicked.
Perhaps this is a javascript issue? The source of these
html pages did appear to still contain a lot of
www.google.com URLs that should have been rewritten to
local URLs. | |