collecting ++5500 sreenshots from urls & save them

Subject: collecting ++5500 sreenshots from urls & save them

Author: schoko_lade

Date: 12/10/2011 22:22

hello dear all; good evening all httrack-folks

at the moment i try to figure out the easiest way to do a job! 


I ve got a list of 5500 websites and need to grab a little screenshot of them
- to create a thumbnail. How do i do that
Note: i only need the screenshots - nothing more. Thats pretty easy - no
scraping that goes into the deepnes of the site. Thank god it is that easy!

Here is Perl solution:

#!/usr/bin/perl

    use WWW::Mechanize::Firefox;
    my $mech = WWW::Mechanize::Firefox->new();

    open(INPUT, "urls.txt") or die "Can't open file: $!";

    while (<INPUT>) {
      chomp;
      $mech->get($_);
      my $png = $mech->content_as_png();
    }
    close(INPUT);
    exit; 

From the docs: Returns the given tab or the current page rendered as PNG
image. All parameters are optional. $tab defaults to the current tab. If the
coordinates are given, that rectangle will be cut out. The coordinates should
be a hash with the four usual entries, left,top,width,height.

    This is specific to WWW::Mechanize::Firefox.

Currently, the data transfer between Firefox and Perl is done Base64-encoded.
It would be beneficial to find what's necessary to make JSON handle binary
data more gracefully.

the source is here: 

   Filename: urls.txt
    ------------------
    www.google.com
    www.cnn.com
    www.msnbc.com
    news.bbc.co.uk
    www.bing.com
    www.yahoo.com 


open my $out, '>', "$_.png" or die "could not open '$_.png' for output $!";
print $out $png;
close $out;


Again: Note: i only need the screenshots - nothing more. Thats pretty easy -
no scraping that goes into the deepnes of the site. Thank god it is that
easy!


with the second code i can store the files into and folder using the
corresponding names 

Well - friends adviced me to do it with httrack! As i run OpenSuse 11.4 - i
think it would be easy to do this work with httrack - can you advice me what i
have to do with the httrack ion order to  get a output as a thumbnail... and
yes - i need to have corresponding names (/otherwise it wold be a great mess)
. love toh  hear from you grreetings unleash

All articles

Subject	Author	Date
collecting ++5500 sreenshots from urls & save them		12/10/2011 22:22
Re: collecting ++5500 sreenshots from urls & save them		12/11/2011 14:30
Re: collecting ++5500 sreenshots from urls & save them		12/11/2011 20:07