Posts Tagged ‘wikipedia’

Downloading full resolution images from Wikipedia

3 janvier 2014

Here’s a snippet I used to download some images from wikipedia.

perl WikipediaFilePage LocalName

where WikipediaFilePage is an url like «,_The_Square.jpg »
which describes an image that you want to download at full res, and LocalName is the name you want to give it on your harddisk. You can give as many « WikipediaFilePage LocalName » pairs as you like.

– Any error will stop everything (in case there are multiples pairs on the command line).
– There’s a huge memory leak (thanks to feldspath for deeply analyzing my code :D)
– This list is very much incomplete.

#!/usr/bin/perl -w
use WWW::Mechanize;
use strict;
use HTML::TreeBuilder;
binmode STDOUT, ":utf8"; # spit utf8 to terminal
use utf8; # allow for utf8 inside the code.

my $url;
 while (defined($url = shift)) {
 my $mech = WWW::Mechanize->new;
 my $tree = HTML::TreeBuilder->new;
 my $tempfile = shift; ## user supplied

print "Downloading $url...";
 $mech->get($url) and print "done !\n" or die;
 # real image:
 $url = $tree->look_down(class=>'fullImageLink')->look_down(_tag => "a")->attr('href');

print "Downloading $url to $tempfile...";
 $mech->get($url, ':content_file' => $tempfile) and print "done !\n" or die;