Re: collecting ++5500 sreenshots from urls & save them

Subject: Re: collecting ++5500 sreenshots from urls & save them

Author: schoko_lade

Date: 12/11/2011 21:14

hello dear Xavier 

again me! 



Hmmm - i guess there is a main difference between retrieving HTML (on the one
handside) and retrieving an image (on the other handside).

Retrieving a image - with the Perl-code and the FireFox[/B] (see the code that
includes the FireFox part in Mechanize) seems to be much much smarter than -
for example doing it with httrack (the famous tool). With the little
Perl-snippet we re able to do nice rendering, and interpreting css/js. The
regular browser (automated) such as firefox is able do a good job here.
On a sidenote: Considering to do the fetching-job this little Perl-Snippet is
far more powerful -than httrack - since this job is not something httrack
would do easily. HTTrack is only able to grab part of website(s), but is not
able to do any rendering of any sort, nor interpreting css/js.

[PHP]
#!/usr/bin/perl

    use WWW::Mechanize::Firefox;
    my $mech = WWW::Mechanize::Firefox->new();

    open(INPUT, "urls.txt") or die "Can't open file: $!";

    while (<INPUT>) {
      chomp;
      $mech->get($_);
      my $png = $mech->content_as_png();
    }
    close(INPUT);
    exit; 
[/PHP]

Well: There is absolutly no need to fetch HTML-Contents.

Caching the image is done easily with the Perl-Snippet. And therefore Httrack
is (absolutley) not the tool that i should take into consideration.

what do you think !? 

greetings

Create subthread

All articles

Subject	Author	Date
Re: collecting ++5500 sreenshots from urls & save them		12/11/2011 14:30
Re: collecting ++5500 sreenshots from urls & save them		12/11/2011 21:14