| hello dear all; good evening all httrack-folks
at the moment i try to figure out the easiest way to do a job!
I ve got a list of 5500 websites and need to grab a little screenshot of them
- to create a thumbnail. How do i do that
Note: i only need the screenshots - nothing more. Thats pretty easy - no
scraping that goes into the deepnes of the site. Thank god it is that easy!
Here is Perl solution:
#!/usr/bin/perl
use WWW::Mechanize::Firefox;
my $mech = WWW::Mechanize::Firefox->new();
open(INPUT, "urls.txt") or die "Can't open file: $!";
while (<INPUT>) {
chomp;
$mech->get($_);
my $png = $mech->content_as_png();
}
close(INPUT);
exit;
From the docs: Returns the given tab or the current page rendered as PNG
image. All parameters are optional. $tab defaults to the current tab. If the
coordinates are given, that rectangle will be cut out. The coordinates should
be a hash with the four usual entries, left,top,width,height.
This is specific to WWW::Mechanize::Firefox.
Currently, the data transfer between Firefox and Perl is done Base64-encoded.
It would be beneficial to find what's necessary to make JSON handle binary
data more gracefully.
the source is here:
Filename: urls.txt
------------------
www.google.com
www.cnn.com
www.msnbc.com
news.bbc.co.uk
www.bing.com
www.yahoo.com
open my $out, '>', "$_.png" or die "could not open '$_.png' for output $!";
print $out $png;
close $out;
Again: Note: i only need the screenshots - nothing more. Thats pretty easy -
no scraping that goes into the deepnes of the site. Thank god it is that
easy!
with the second code i can store the files into and folder using the
corresponding names
Well - friends adviced me to do it with httrack! As i run OpenSuse 11.4 - i
think it would be easy to do this work with httrack - can you advice me what i
have to do with the httrack ion order to get a output as a thumbnail... and
yes - i need to have corresponding names (/otherwise it wold be a great mess)
. love toh hear from you grreetings unleash
| |