Tuesday, August 20, 2013

Reading tweets and store them into MongoDB (Java)

Few days back, I was looking for searching tweets example in java using API v1.1 and found none but Stackoverflow threads and API helps me to come up with small example to cover both Twitter and  MongoDB to store these tweets and I decided to share with my circle and readers of this blog.

In Twitter API v1.1 developers are come up with the big change which is "every request to the API to be authenticated". It simply means using Twitter API developer must use OAuth to obtain access token behalf on user account. If you’re new to Twitter Developer API you can follow this URL to create Twitter App in 8 Easy Steps.

I am going to use Twitter4j API. It’s an unofficial java library for the Twitter API. I found it straight, easy to integrate your application with Twitter Services. Please use same link to download and read more about this library. By using this API I do search to get tweets in a form of JSON and save it to database. For database I used MongoDB as I found fastest growing with NoSQL database. Its JSON query style, easy installation make it more preferable. This post cover other advantages about using MongoDB along with code so this make more sense to you. lets dive

Pre-installation Tasks

Create your app in Twitter
Please follow this Twitter App in 8 Easy Steps to create.

Installation for MongoDB
if you are using windows OS I recommend to flow MKYONG Tutorial How To Install MongoDB On Windows and for other platform use Installation Guides on mongoDB official website.

At this point I assume you have already setup MongoDB and downloaded the library from twitter4j official website.

I am using a Netbeans IDE for write this task you can use any IDE you like.
Create Java Project and named it "TwitterMongoDBApp" follow the picture.

No add the libraries into the project Right-Click on the project this open the window where you'll libraries tab at left side. Click on Add Jar/Folder button to choose mentioned libraries into the project.


Before write code make sure you have done your OAuth Setting in twitter as we gonna use four values from this settings.

1) Consumer Key
2) Consumer Secret
3) Access Token
4) Access Token Secret

Please follow the image for this reference.


Twitter Configuration & MongoDB Connection

After all verified with all the prerequisite I created a class TwitterMongoDBApp I used the static block for configuration twitter account with ConfigurationBuilder class in twitter4j.which accept OAuth parameters we discussed earlier.

In constructor calling initMongoDB() method to initialized MongoDB to connect with running server and initialize the class level variable db with using name of "tweetDB". this is same like in SQL "CREATE SCHEMA IF NOT EXISTS tweetDB"

 import com.mongodb.BasicDBObject;  
 import com.mongodb.DB;  
 import com.mongodb.DBCollection;  
 import com.mongodb.DBCursor;  
 import com.mongodb.Mongo;  
 import com.mongodb.MongoException;  
 import java.net.UnknownHostException;  
 import java.util.List;  
 import java.util.Scanner;  
 import twitter4j.Query;  
 import twitter4j.QueryResult;  
 import twitter4j.Status;  
 import twitter4j.Twitter;  
 import twitter4j.TwitterException;  
 import twitter4j.TwitterFactory;  
 import twitter4j.UserMentionEntity;  
 import twitter4j.conf.ConfigurationBuilder;  
 /**  
  *  
  * @author Muhammad.Saifuddin  
  */  
 public class TwitterMongoDBApp{  
   private static ConfigurationBuilder cb;  
   private DB db;  
   private DBCollection items;  
   public TwitterMongoDBApp() {  
     try {  
       // on constructor load initialize MongoDB and load collection  
       initMongoDB();  
       items = db.getCollection("tweetColl");  
     } catch (MongoException ex) {  
       System.out.println("MongoException :" + ex.getMessage());  
     }  
   }  
   /**  
    * static block used to construct a connection with tweeter   
    * with twitter4j configuration with provided settings.   
    * This configuration builder will be used for next search   
    * action to fetch the tweets from twitter.com.  
    */  
   static {  
     cb = new ConfigurationBuilder();  
     cb.setDebugEnabled(true);  
     cb.setOAuthConsumerKey("**********");  
     cb.setOAuthConsumerSecret("**************");  
     cb.setOAuthAccessToken("*********************");  
     cb.setOAuthAccessTokenSecret("********************");  
   }  
   public static void main(String[] args) {  
     TwitterMongoDBApp taskObj = new TwitterMongoDBApp();  
     taskObj.loadMenu();  
   }  
 
   /**  
    * initMongoDB been called in constructor so every object creation this  
    * initialize MongoDB.  
    */  
   public void initMongoDB() throws MongoException {  
     try {  
       System.out.println("Connecting to Mongo DB..");  
       Mongo mongo;  
       mongo = new Mongo("127.0.0.1");  
       db = mongo.getDB("tweetDB");  
     } catch (UnknownHostException ex) {  
       System.out.println("MongoDB Connection Errro :" + ex.getMessage());  
     }  
   }
}  
   

After done initialization for both Twitter and MongoDB. This program display menu on console and ask user to input their selection that what loadMenu() method do.

 public void loadMenu() {  
     System.out.println("================\n\tTwitter Task\n===============");  
     System.out.println("1 - Load 100 Tweets & save into Mongo DB");  
     System.out.println("2 - Load Top 5 Retweet");  
     System.out.println("3 - Load Top 5 mentioned");  
     System.out.println("4 - Load Top 5 followed");  
     System.out.println("5 - Exit");  
     System.out.print("Please enter your selection:\t");  
     Scanner scanner = new Scanner(System.in);  
     int selection = scanner.nextInt();  
     if (selection == 1) {  
       getTweetByQuery(true);  
     } else if (selection == 2) {  
       getTopRetweet();  
     } else if (selection == 3) {  
       getTopMentioned();  
     } else if (selection == 4) {  
       getTopfollowed();  
     } else if (selection == 5) {  
       db.dropDatabase();  
       System.exit(0);  
     } else {  
       System.out.println("Wrong Selection Found..\n\n");  
       loadMenu();  
     }  
   }  

Searching Tweets and Store in MongoDB

Behind the first selection getTweetByQuery(true) get called which use to creates a TwitterFactory with the given configuration and after create Query object to define the search criteria for tweets and passed into Twitter.search method initialized from TwitterFactory using getInstance method. I set “Java” keyword for query and set number of result to retrieve. Then queryResult Class from twitter4j library holds the result after executed search.

Now you done with the getting tweets from Twitter. Moving on two next step to store these records into MongoDB. Before we look into the code a common terminologies used in MongoDB are corresponding to SQL concept and terminology:

• Database: Database
• Table: Collection
• Row: Document

To read more Terminology and Concepts in MongoDB use this URL.

MongoDB is a document-oriented database where each document in a same collection may have a totally different structure and query the database has become easier and efficient. Can go with without Join

  /**  
    * void getTweetByQuery method used to fetch records from twitter.com using  
    * Query class to define query for search param with record count.  
    * QueryResult persist result from twitter and provide into the list to  
    * iterate records 1 by one and later on item.insert is call to store this  
    * BasicDBObject into MongoDB items Collection.  
    *  
    * @param url an absolute URL giving the base location of the image  
    * @see BasicDBObject, DBCursor, TwitterFactory, Twitter  
    */  
   public void getTweetByQuery(boolean loadRecords) {  
     if (cb != null) {  
       TwitterFactory tf = new TwitterFactory(cb.build());  
       Twitter twitter = tf.getInstance();  
       try {  
         Query query = new Query("java");  
         query.setCount(50);  
         QueryResult result;  
         result = twitter.search(query);  
         System.out.println("Getting Tweets...");  
         List<Status> tweets = result.getTweets();  
         for (Status tweet : tweets) {  
           BasicDBObject basicObj = new BasicDBObject();  
           basicObj.put("user_name", tweet.getUser().getScreenName());  
           basicObj.put("retweet_count", tweet.getRetweetCount());  
           basicObj.put("tweet_followers_count",  
               tweet.getUser().getFollowersCount());  
           UserMentionEntity[] mentioned = tweet.getUserMentionEntities();  
           basicObj.put("tweet_mentioned_count", mentioned.length);  
           basicObj.put("tweet_ID", tweet.getId());  
           basicObj.put("tweet_text", tweet.getText());  
           try {  
             items.insert(basicObj);  
           } catch (Exception e) {  
             System.out.println("MongoDB Connection Error : "  
                 + e.getMessage());  
             loadMenu();  
           }  
         }  
         // Printing fetched records from DB.  
         if (loadRecords) {  
           getTweetsRecords();  
         }  
       } catch (TwitterException te) {  
         System.out.println("te.getErrorCode() " + te.getErrorCode());  
         System.out.println("te.getExceptionCode() " + te.getExceptionCode());  
         System.out.println("te.getStatusCode() " + te.getStatusCode());  
         if (te.getStatusCode() == 401) {  
           System.out.println("Twitter Error : \nAuthentication "  
               + "credentials (https://dev.twitter.com/pages/auth) "  
               + "were missing or incorrect.\nEnsure that you have "  
               + "set valid consumer key/secret, access "  
               + "token/secret, and the system clock is in sync.");  
         } else {  
           System.out.println("Twitter Error : " + te.getMessage());  
         }  
         loadMenu();  
       }  
     } else {  
       System.out.println("MongoDB is not Connected!"  
           + " Please check mongoDB intance running..");  
     }  
   }  

after set all the related field in BasicDBObject object finally store this prepared document into MongoDB by calling this line of code items.insert(basicObj); isn't it simple :).

Fetching Records from MongoDB

As you have seen previously in code that we are using getTweetsRecords method here is the implementation, which use the items Collections to find documents with BasicDBObject with the defined field to fetch similar like Select query statement and iterate over the results by DBCursor.

 /**  
 * void method print fetched records from mongodb This method use the  
 * preloaded items (Collection) for fetching records and print them on  
 * console.  
 */  
   public void getTweetsRecords() {  
     BasicDBObject fields = new BasicDBObject("_id", true).append("user_name", true).append("tweet_text", true);  
     DBCursor cursor = items.find(new BasicDBObject(), fields);  
     while (cursor.hasNext()) {  
       System.out.println(cursor.next());  
     }  
     loadMenu();  
   }  

To get the top 10 Retweets from collection applied desc sort in MongoDB query to fetch them.

 /**  
   * void method print fetched top retweet records from   
   * preloaded items collection with the help of BasicDBObject   
   * class defined sort with desc with fixed limit 10.  
   * @see BasicDBObject, DBCursor  
   */  
   public void getTopRetweet() {  
     if (items.count() <= 0) {  
       getTweetByQuery(false);  
     }  
     BasicDBObject query = new BasicDBObject();  
     query.put("retweet_count", -1);  
     DBCursor cursor = items.find().sort(query).limit(10);  
     System.out.println("items length " + items.count());  
     while (cursor.hasNext()) {  
       System.out.println(cursor.next());  
     }  
     loadMenu();  
   }  

Source Code:

Please use this url ( https://github.com/m-saifuddin/TwitterExample) to download the discussed example.