| I'm working on a compression benchmark; I want to grab png images from the
Alexa top 100 websites, yet I can't make it to work even with number 1,
google.
What I'm running is
httrack "google.com" -B %H -e -n -t -%e1R1 -%P "+*.png"
As you can see I tried probably all brute force options, but I get an empty
page. Weird. Log shows:
12:55:54 Warning: Redirected link is identical because of 'URL Hack' option:
<http://google.com/robots.txt> and www.google.com/robots.txt
12:55:54 Warning: File has moved from <http://google.com/robots.txt> to
<http://www.google.com/robots.txt>
12:55:55 Warning: Redirected link is identical because of 'URL Hack' option:
<http://google.com/> and www.google.com/
12:55:55 Warning: File has moved from <http://google.com/> to
<http://www.google.com/>
12:55:55 Error: "Unable to get server's address: Unknown error" (-5) after 2
retries at link %h/robots.txt (from primary/primary)
12:56:01 Error: "Unable to get server's address: Unknown error" (-5) after 2
retries at link %h/ (from primary/primary)
If I add www:
httrack "www.google.com" -B %H -e -n -t -%e1R1 -%P "+*.png"
it grabs way too much, but still skips the main logo.
Could anybody suggest the fix, please? | |