Download photos from Flickr!

March 27, 2010

[Warning: This post is a backup recovery from my previous Wordpress blog. All content was automatically converted accessing a MySQL database using a Python script (details). Mostly are in Portuguese but if you are interest I can translate to English. If you found any problem dont’t hesitate to contact me in comments.]

I think that one problem regarding creating a script to download photos form Flickr is getting copyrighted material. Suppose that you want all photos from Itajub city. You can use Flickr API or an YQL request, as follows:

select * from flickr.photos.search where text="itajub"

My first try to get license information was showing the pertinent part of JSON output. You could see that there isn't a field telling that, but if you look at documentation there's a "license" parameter.

"farm": "5", "id": "4458853758", "isfamily": "0", "isfriend": "0", "ispublic": "1", "owner": "76062736@N00", "secret": "9edcd7aea8", "server": "4003", "title": "Clube"

Flickr! has seven options to licensing your photo (again I didn’t found in documentation, I tried each one)

None (All rights reserved) [ license="0" ]
Attribution-NonCommercial-ShareAlike Creative Commons [ license="1" ]
Attribution-NonCommercial Creative Commons [ license="2" ]
Attribution-NonCommercial-NoDerivs Creative Commons [ license="3" ]
Attribution Creative Commons [ license="4" ]
Attribution-ShareAlike Creative Commons [ license="5" ]
Attribution-NoDerivs Creative Commons [ license="6" ]

My friend was looking to a way to download all photos from Yahoo! Open Hack Brazil so I decided to create a quick way to get theses photos (respecting copyrighted material). My query looks like:

select * from flickr.photos.search(0,2000) where tags="brhackday" and license="1,2,3,4,5,6"

After tests I put the URL from YQL virtual console in my Python code (that parses JSON) and generate a Shell Script to download (using curl/wget)

import simplejson
import urllib 

f = open("download.sh","w+")
f.write("#!/bin/sh\n")
i = 0
while True:

	url_base="http://query.yahooapis.com/v1/public/yql?q=select%20*%20from%20flickr.photos.search(" + str(i) 
+ "%2C" + str(i+100) +")%20where%20tags%3D%22brhackday%22%20and%20license%3D%221%2C2%2C3%2C4%2C5%2C6%22
&format=json" 	

	rest = simplejson.load(urllib.urlopen(url_base))['query']

        if rest['count'] == '0':
		f.close()
		exit() 

	photos = rest['results']['photo'] 

	for photo in photos:
		id = photo['id']
		owner = photo['owner']
		d = "curl -s http://www.flickr.com/photos/%s/%s/sizes/o 
| egrep \"<img src=\" 
| sed 's/
//g ; s/<br \/>//g ; s/<\/p>//g ; s/<img src=\"//g ; s/\" \/>//g' 
| xargs wget ;" % (owner, id)
		f.write(d+'\n')
		i += 100

f.close()</pre>
Probably If you cut & paste the code they will not work due some indentation issue, so I published in gist here.
The usage I simple:
$ python flickr-d.py
$ sh download.sh
This was a extremely simple workaround but I expect you enjoy!