A simple python script to find bloggers on twitter

Hi! I know you are probably asking what the heck is he doing? I will be honest here, this is not a tutorial but a simple quick fix to a problem I faced last night. I was invited by an author to take part in the so called blog hopping. One of the requirements was to find at least five other authors with blogs to join the adventure. So, I asked myself, how could I make this as much fun as possible and learn something at the same time?

So today, I packed my gear and headed to a nearby coffee shop to create something. In a few lines later (I had to wait for one hour after I exceeded 150 rate limit on Twitter), I had put together a simple script that helped me reduce the pain of searching for potential bloggers(who are authors).


  1. I assumed that Twitter users who mentioned the word ‘authors’ were likely to be authors themselves or review books through their blogs.
  2. Secondly, I made a partial assumption that most twitter users who talk about authors had their ‘website’ part of their profiles filled. So, I could easily get their blog addresses. If they didn’t have anything listed, I just moved on.

Now let us take a closer look at the code: part 1

#Let us get our first part out of the way here.
#I am using python, so let us import the needed libraries
import json
import urllib2
url = 'http://search.twitter.com/search.json?q=authors' #link
user_ids = []                            #list to store user_ids
#define our first of the two functions to be used
def get_user_ids():
    data = urllib2.urlopen(url)   #make the call to twitter
    js   = json.load(data)        #parse using json.load() method
    i = 0
    while i < len(js['results']): #you need to know what is returned
        i += 1
    return user_ids
#after getting data from twitter, I iterate through the 'results'
#and grab the from_user_id_str - whoever mentioned the word 'authors'
#I find this more convenient than using a screen name.
#I then add that value to the user_ids list for later use.
#finally, I return the list
print 'Move to the next level now!'

In the second part of this post, I want us to look at the second function definition that will complete our script. Yeah, it is really short!

#Within the same file, we will continue our script!

def get_blogs():
    i = 0
    user_urls = []            #define a list to store the urls
    user_ids = get_user_ids() #get the user_ids and store for use
    while i < len(user_ids):
        url = 'http://api.twitter.com/1/statuses/user_timeline/'+
               user_ids[i] + '.json'
        data = urllib2.urlopen(url) #make the call
        js = json.load(data)
        if js[0]['user']['url'] is not None:
        i += 1
    return user_urls

Can you believe we have finished the script? Well, we have reached the end. I know you are saying: show me! And to that question, an answer is worth it – in a snapshot! Let us run this code by doing the following:

#Call the function and store the links in a list.
user_links = get_blogs()
for link in user_links:
    print link

#That is all we need to get everything from the list!!

And ….. here is what I got when I executed that code!

As you can see, you get easy to read urls that you can open using your browser of choice! One thing you might notice is that not all of the links are either wordpress or blogspot. You can go ahead and improve the script to grab only those links that have either wordpress or blogspot in them. You can view the image clearly by clicking on it!

So why is this a better idea than searching on Google for individual bloggers? Simple; you get a ton of links that you can scan through using your browser, sending requests to their owners and saving yourself some time!

As far as time is concerned, this script makes several API calls during the execution and you will notice a slight delay in completion. Also, you might hit the required rate limit (150) without knowing because you are making several requests per execution. That being said, I still had fun doing this and I have sent numerous requests for blog hops within a short time!


I make an API call to Twitter servers to search for the word ‘authors’. Then saved the user_ids of the people who mentioned that word. We then use their user_ids to make another API call to their timelines. While doing that, we grab their url (the property that is part of the Twitter profile). If the url is missing, we ignore, otherwise, we store it in a new list. We finally iterate through the list and print the links out in a clean manner then visit them individually! That is it! I hope you had fun reading through this post. Got questions? Ask them! Thank you.


2 thoughts on “A simple python script to find bloggers on twitter

  1. This is a really neat tool. How difficult would it be to add the functionality to change your IP via a proxy list once you reach the 150 result threshhold? Interesting method to scrape blogs websites.

    1. I don’t really see a need to do that – using a proxy. All you need to do is optimize the application so that caching is done. If you get more hits than you expected, you can write to Twitter to ask for whitelisting where the limits are removed for your application.

      They advise against whitelisting though. Just make your application cache the data and you should be good to go. I mean, I don’t think a user will be using the application to make that much requests unless there is something else he is trying to do besides finding bloggers or such.

I know you want to say something, say it!

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s