Tuesday, August 20, 2013

Reading tweets and store them into MongoDB (Java)

Few days back, I was looking for searching tweets example in java using API v1.1 and found none but Stackoverflow threads and API helps me to come up with small example to cover both Twitter and  MongoDB to store these tweets and I decided to share with my circle and readers of this blog.

In Twitter API v1.1 developers are come up with the big change which is "every request to the API to be authenticated". It simply means using Twitter API developer must use OAuth to obtain access token behalf on user account. If you’re new to Twitter Developer API you can follow this URL to create Twitter App in 8 Easy Steps.

I am going to use Twitter4j API. It’s an unofficial java library for the Twitter API. I found it straight, easy to integrate your application with Twitter Services. Please use same link to download and read more about this library. By using this API I do search to get tweets in a form of JSON and save it to database. For database I used MongoDB as I found fastest growing with NoSQL database. Its JSON query style, easy installation make it more preferable. This post cover other advantages about using MongoDB along with code so this make more sense to you. lets dive

Pre-installation Tasks

Create your app in Twitter
Please follow this Twitter App in 8 Easy Steps to create.

Installation for MongoDB
if you are using windows OS I recommend to flow MKYONG Tutorial How To Install MongoDB On Windows and for other platform use Installation Guides on mongoDB official website.

At this point I assume you have already setup MongoDB and downloaded the library from twitter4j official website.

I am using a Netbeans IDE for write this task you can use any IDE you like.
Create Java Project and named it "TwitterMongoDBApp" follow the picture.

No add the libraries into the project Right-Click on the project this open the window where you'll libraries tab at left side. Click on Add Jar/Folder button to choose mentioned libraries into the project.


Before write code make sure you have done your OAuth Setting in twitter as we gonna use four values from this settings.

1) Consumer Key
2) Consumer Secret
3) Access Token
4) Access Token Secret

Please follow the image for this reference.


Twitter Configuration & MongoDB Connection

After all verified with all the prerequisite I created a class TwitterMongoDBApp I used the static block for configuration twitter account with ConfigurationBuilder class in twitter4j.which accept OAuth parameters we discussed earlier.

In constructor calling initMongoDB() method to initialized MongoDB to connect with running server and initialize the class level variable db with using name of "tweetDB". this is same like in SQL "CREATE SCHEMA IF NOT EXISTS tweetDB"

 import com.mongodb.BasicDBObject;  
 import com.mongodb.DB;  
 import com.mongodb.DBCollection;  
 import com.mongodb.DBCursor;  
 import com.mongodb.Mongo;  
 import com.mongodb.MongoException;  
 import java.net.UnknownHostException;  
 import java.util.List;  
 import java.util.Scanner;  
 import twitter4j.Query;  
 import twitter4j.QueryResult;  
 import twitter4j.Status;  
 import twitter4j.Twitter;  
 import twitter4j.TwitterException;  
 import twitter4j.TwitterFactory;  
 import twitter4j.UserMentionEntity;  
 import twitter4j.conf.ConfigurationBuilder;  
 /**  
  *  
  * @author Muhammad.Saifuddin  
  */  
 public class TwitterMongoDBApp{  
   private static ConfigurationBuilder cb;  
   private DB db;  
   private DBCollection items;  
   public TwitterMongoDBApp() {  
     try {  
       // on constructor load initialize MongoDB and load collection  
       initMongoDB();  
       items = db.getCollection("tweetColl");  
     } catch (MongoException ex) {  
       System.out.println("MongoException :" + ex.getMessage());  
     }  
   }  
   /**  
    * static block used to construct a connection with tweeter   
    * with twitter4j configuration with provided settings.   
    * This configuration builder will be used for next search   
    * action to fetch the tweets from twitter.com.  
    */  
   static {  
     cb = new ConfigurationBuilder();  
     cb.setDebugEnabled(true);  
     cb.setOAuthConsumerKey("**********");  
     cb.setOAuthConsumerSecret("**************");  
     cb.setOAuthAccessToken("*********************");  
     cb.setOAuthAccessTokenSecret("********************");  
   }  
   public static void main(String[] args) {  
     TwitterMongoDBApp taskObj = new TwitterMongoDBApp();  
     taskObj.loadMenu();  
   }  
 
   /**  
    * initMongoDB been called in constructor so every object creation this  
    * initialize MongoDB.  
    */  
   public void initMongoDB() throws MongoException {  
     try {  
       System.out.println("Connecting to Mongo DB..");  
       Mongo mongo;  
       mongo = new Mongo("127.0.0.1");  
       db = mongo.getDB("tweetDB");  
     } catch (UnknownHostException ex) {  
       System.out.println("MongoDB Connection Errro :" + ex.getMessage());  
     }  
   }
}  
   

After done initialization for both Twitter and MongoDB. This program display menu on console and ask user to input their selection that what loadMenu() method do.

 public void loadMenu() {  
     System.out.println("================\n\tTwitter Task\n===============");  
     System.out.println("1 - Load 100 Tweets & save into Mongo DB");  
     System.out.println("2 - Load Top 5 Retweet");  
     System.out.println("3 - Load Top 5 mentioned");  
     System.out.println("4 - Load Top 5 followed");  
     System.out.println("5 - Exit");  
     System.out.print("Please enter your selection:\t");  
     Scanner scanner = new Scanner(System.in);  
     int selection = scanner.nextInt();  
     if (selection == 1) {  
       getTweetByQuery(true);  
     } else if (selection == 2) {  
       getTopRetweet();  
     } else if (selection == 3) {  
       getTopMentioned();  
     } else if (selection == 4) {  
       getTopfollowed();  
     } else if (selection == 5) {  
       db.dropDatabase();  
       System.exit(0);  
     } else {  
       System.out.println("Wrong Selection Found..\n\n");  
       loadMenu();  
     }  
   }  

Searching Tweets and Store in MongoDB

Behind the first selection getTweetByQuery(true) get called which use to creates a TwitterFactory with the given configuration and after create Query object to define the search criteria for tweets and passed into Twitter.search method initialized from TwitterFactory using getInstance method. I set “Java” keyword for query and set number of result to retrieve. Then queryResult Class from twitter4j library holds the result after executed search.

Now you done with the getting tweets from Twitter. Moving on two next step to store these records into MongoDB. Before we look into the code a common terminologies used in MongoDB are corresponding to SQL concept and terminology:

• Database: Database
• Table: Collection
• Row: Document

To read more Terminology and Concepts in MongoDB use this URL.

MongoDB is a document-oriented database where each document in a same collection may have a totally different structure and query the database has become easier and efficient. Can go with without Join

  /**  
    * void getTweetByQuery method used to fetch records from twitter.com using  
    * Query class to define query for search param with record count.  
    * QueryResult persist result from twitter and provide into the list to  
    * iterate records 1 by one and later on item.insert is call to store this  
    * BasicDBObject into MongoDB items Collection.  
    *  
    * @param url an absolute URL giving the base location of the image  
    * @see BasicDBObject, DBCursor, TwitterFactory, Twitter  
    */  
   public void getTweetByQuery(boolean loadRecords) {  
     if (cb != null) {  
       TwitterFactory tf = new TwitterFactory(cb.build());  
       Twitter twitter = tf.getInstance();  
       try {  
         Query query = new Query("java");  
         query.setCount(50);  
         QueryResult result;  
         result = twitter.search(query);  
         System.out.println("Getting Tweets...");  
         List<Status> tweets = result.getTweets();  
         for (Status tweet : tweets) {  
           BasicDBObject basicObj = new BasicDBObject();  
           basicObj.put("user_name", tweet.getUser().getScreenName());  
           basicObj.put("retweet_count", tweet.getRetweetCount());  
           basicObj.put("tweet_followers_count",  
               tweet.getUser().getFollowersCount());  
           UserMentionEntity[] mentioned = tweet.getUserMentionEntities();  
           basicObj.put("tweet_mentioned_count", mentioned.length);  
           basicObj.put("tweet_ID", tweet.getId());  
           basicObj.put("tweet_text", tweet.getText());  
           try {  
             items.insert(basicObj);  
           } catch (Exception e) {  
             System.out.println("MongoDB Connection Error : "  
                 + e.getMessage());  
             loadMenu();  
           }  
         }  
         // Printing fetched records from DB.  
         if (loadRecords) {  
           getTweetsRecords();  
         }  
       } catch (TwitterException te) {  
         System.out.println("te.getErrorCode() " + te.getErrorCode());  
         System.out.println("te.getExceptionCode() " + te.getExceptionCode());  
         System.out.println("te.getStatusCode() " + te.getStatusCode());  
         if (te.getStatusCode() == 401) {  
           System.out.println("Twitter Error : \nAuthentication "  
               + "credentials (https://dev.twitter.com/pages/auth) "  
               + "were missing or incorrect.\nEnsure that you have "  
               + "set valid consumer key/secret, access "  
               + "token/secret, and the system clock is in sync.");  
         } else {  
           System.out.println("Twitter Error : " + te.getMessage());  
         }  
         loadMenu();  
       }  
     } else {  
       System.out.println("MongoDB is not Connected!"  
           + " Please check mongoDB intance running..");  
     }  
   }  

after set all the related field in BasicDBObject object finally store this prepared document into MongoDB by calling this line of code items.insert(basicObj); isn't it simple :).

Fetching Records from MongoDB

As you have seen previously in code that we are using getTweetsRecords method here is the implementation, which use the items Collections to find documents with BasicDBObject with the defined field to fetch similar like Select query statement and iterate over the results by DBCursor.

 /**  
 * void method print fetched records from mongodb This method use the  
 * preloaded items (Collection) for fetching records and print them on  
 * console.  
 */  
   public void getTweetsRecords() {  
     BasicDBObject fields = new BasicDBObject("_id", true).append("user_name", true).append("tweet_text", true);  
     DBCursor cursor = items.find(new BasicDBObject(), fields);  
     while (cursor.hasNext()) {  
       System.out.println(cursor.next());  
     }  
     loadMenu();  
   }  

To get the top 10 Retweets from collection applied desc sort in MongoDB query to fetch them.

 /**  
   * void method print fetched top retweet records from   
   * preloaded items collection with the help of BasicDBObject   
   * class defined sort with desc with fixed limit 10.  
   * @see BasicDBObject, DBCursor  
   */  
   public void getTopRetweet() {  
     if (items.count() <= 0) {  
       getTweetByQuery(false);  
     }  
     BasicDBObject query = new BasicDBObject();  
     query.put("retweet_count", -1);  
     DBCursor cursor = items.find().sort(query).limit(10);  
     System.out.println("items length " + items.count());  
     while (cursor.hasNext()) {  
       System.out.println(cursor.next());  
     }  
     loadMenu();  
   }  

Source Code:

Please use this url ( https://github.com/m-saifuddin/TwitterExample) to download the discussed example.

17 comments:

Unknown said...

Thank you for such a good post.
But,I am unable to get the result error I am getting is listed below;

Connecting to Mongo DB..
========================================
Twitter Task
========================================
1 - Load 100 Tweets & save into Mongo DB
2 - Load Top 5 Retweet
3 - Load Top 5 mentioned
4 - Load Top 5 followed
5 - Exit
Please enter your selection: 1
te.getErrorCode() -1
te.getExceptionCode() d35baff5-12c94143 43208640-465ee2e3
te.getStatusCode() -1
Twitter Error : connect timed out

Saifuddin said...

Hi Vaibhav kholi,

Thanks for your feedback.

I am not sure about why this error coming as it very less information in exception error and message.

But after Google search with exception code point me number of causes of this error.

http://stackoverflow.com/a/16052415/388053

http://stackoverflow.com/a/17141631/388053

hope it helps.

Unknown said...

I have checked my firewall is disabled and consumer key and oath access token keys are set properly.
My mongoDB is working fine tweetdb is made only, it is not getting connected to twitter API.

> show dbs
local 0.078125GB
test 0.203125GB
tweetDB (empty)

Is their any thing else to be edited inspite of editing the 4 Keys?

Unknown said...

Thanks for this great article! It helped me a lot.
But is there a way to avoid duplicates in the database as the twitter search will often deliver the same tweets when it is called within some minutes?
So is there an easy line of code to delete duplicates for example based on the tweet_id?

Thanks in advance

Anonymous said...

hi Muhammad Saifuddin
my name jaydeep
i have tested your above code but i have error like :
1 - Load 100 Tweets & save into Mongo DB
2 - Load Top 5 Retweet
3 - Load Top 5 mentioned
4 - Load Top 5 followed
5 - Exit
Please enter your selection: 1
Getting Tweets...
Nov 22, 2013 5:00:11 PM com.mongodb.DBTCPConnector initDirectConnection
WARNING: Exception executing isMaster command on /127.0.0.1:27017
java.io.IOException: couldn't connect to [/127.0.0.1:27017] bc:java.net.ConnectException: Connection refused: connect
at com.mongodb.DBPort._open(DBPort.java:214)
at com.mongodb.DBPort.go(DBPort.java:107)
at com.mongodb.DBPort.go(DBPort.java:88)
at com.mongodb.DBPort.findOne(DBPort.java:143)
at com.mongodb.DBPort.runCommand(DBPort.java:148)
at com.mongodb.DBTCPConnector.initDirectConnection(DBTCPConnector.java:548)
at com.mongodb.Mongo.getMaxBsonObjectSize(Mongo.java:620)
at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:254)
at com.mongodb.DBApiLayer$MyCollection.insert(DBApiLayer.java:226)
at com.mongodb.DBCollection.insert(DBCollection.java:75)
at com.mongodb.DBCollection.insert(DBCollection.java:59)
at com.mongodb.DBCollection.insert(DBCollection.java:104)
at twitterapplication.TwitterTask.getTweetByQuery(TwitterTask.java:150)
at twitterapplication.TwitterTask.loadMenu(TwitterTask.java:82)
at twitterapplication.TwitterTask.main(TwitterTask.java:64)
please help me out

Unknown said...
This comment has been removed by the author.
Saifuddin said...

Hi Jaydeep,

Error reported in exception java.io.IOException: couldn't connect to [/127.0.0.1:27017] mostly appear when your code unable to find the resource with defined values.

It seems that you haven't started your MongoDB Server.
Now first start your mongoDb server and then run this code.

you can use this URL to check how to manage start/stop MongoDB.

hope it helps.

Unknown said...

thanks to fast reply
now i have this error :
1 - Load 100 Tweets & save into Mongo DB
2 - Load Top 5 Retweet
3 - Load Top 5 mentioned
4 - Load Top 5 followed
5 - Exit
Please enter your selection: 1
te.getErrorCode() 215
te.getExceptionCode() d35baff5-1446301f
te.getStatusCode() 400
Twitter Error : 400:The request was invalid. An accompanying error message will explain why. This is the status code will be returned during version 1.0 rate limiting(https://dev.twitter.com/pages/rate-limiting). In API v1.1, a request without authentication is considered invalid and you will get this response.
message - Bad Authentication data
code - 215
please reply me as soon as possible

Saifuddin said...

Hi Jaydeep,

This error detail is already covered in this blog entry, I recommended you to read this blog from start. I assure that you'll resolve this issue after it.

Unknown said...

hi Muhammad Saifuddin
i followed your comment now i successfully run it but now i have 0ne question its not my twitter application tweets and not show database in mongodb cmd prompt when i run > show dbs then it shows
local
test
tweetDB
then i run
>db.tweetDB.findOne()
its shows null.

Connecting to Mongo DB..
========================================
Twitter Task
========================================
1 - Load 100 Tweets & save into Mongo DB
2 - Load Top 5 Retweet
3 - Load Top 5 mentioned
4 - Load Top 5 followed
5 - Exit
Please enter your selection: 1
Getting Tweets...
{ "_id" : { "$oid" : "52904052ffcb8c2505b4184d"} , "user_name" : "Client_sister" , "tweet_text" : "お兄ちゃん、C言語とJavaしかできないくせにiPhone使ってるんだね。"}
{ "_id" : { "$oid" : "52904052ffcb8c2505b4184e"} , "user_name" : "mylinux2" , "tweet_text" : "How-to Install Eclipse 4.3 Kepler for Java Developers on Linux Mint 15 Cinnamon 32bit Easy Visual-Guide http://t.co/qydDygr24k Free Linux"}
{ "_id" : { "$oid" : "52904052ffcb8c2505b4184f"} , "user_name" : "mylinux2" , "tweet_text" : "How-to Install Eclipse 4.3 Kepler for Java Developers on Linux Mint 15 Cinnamon 64bit Easy Visual-Guide http://t.co/jnARRefAzm Free Linux"}
{ "_id" : { "$oid" : "52904052ffcb8c2505b41850"} , "user_name" : "EdmeadsIsemay" , "tweet_text" : "RT @MarielCreasy: #Скачать архив java книг http://t.co/B78SpMfQ1z"}
{ "_id" : { "$oid" : "52904052ffcb8c2505b41851"} , "user_name" : "Java_Junkey" , "tweet_text" : "@kinkymindys #geek me too! Ever try the verity podcast?"}
{ "_id" : { "$oid" : "52904052ffcb8c2505b41852"} , "user_name" : "EmploiCentre" , "tweet_text" : "Ingénieurs D'études Java/j2Ee H/f #orleans #job http://t.co/lvZAZNttll"}
{ "_id" : { "$oid" : "52904052ffcb8c2505b41853"} , "user_name" : "ReeNoo_java" , "tweet_text" : "Di goyang ,,, corrrrr ,,, buka bnyk jos wkwkkwkwkkwkw (with coRy) [vid] — https://t.co/cVO5uvxcog"}


reply me as soon as possible

malika said...

salam

j'ai un projet de fin d'etude qui concerne "l'analyse de sentiment des tweets afin de trouver des detail sur un crime donnée"
j'ai trailler avec java tellque
je recupere mes tweets et les stocker dans mongoDB maintenant j'ai besoin de les recuperer pour faire des analyse

merci

Unknown said...
This comment has been removed by the author.
Saifuddin said...

Dear Nilesh,


Exception in monitor thread while connecting to server localhost:27017
com.mongodb.MongoSocketOpenException: Exception opening socket


It seems that you haven't started your MongoDB Server.
Now first start your mongoDb server and then run this code.

you can use this URL to check how to manage start/stop MongoDB.

hope it helps.

Saifuddin said...

This URL https://docs.mongodb.com/manual/tutorial/manage-mongodb-processes/

Unknown said...

run:
Connecting to Mongo DB..
Sep 26, 2016 5:31:59 PM com.mongodb.diagnostics.logging.JULLogger log
INFO: Cluster created with settings {hosts=[localhost:27017], mode=SINGLE, requiredClusterType=UNKNOWN, serverSelectionTimeout='30000 ms', maxWaitQueueSize=50}
========================================
Twitter Task
========================================
2 - Load Top 5 Retweet
3 - Load Top 5 mentioned
4 - Load Top 5 followed
5 - Exit
Sep 26, 2016 5:31:59 PM com.mongodb.diagnostics.logging.JULLogger log
INFO: Opened connection [connectionId{localValue:1, serverValue:3}] to localhost:27017
Sep 26, 2016 5:31:59 PM com.mongodb.diagnostics.logging.JULLogger log
INFO: Monitor thread successfully connected to server with description ServerDescription{address=localhost:27017, type=STANDALONE, state=CONNECTED, ok=true, version=ServerVersion{versionList=[3, 2, 9]}, minWireVersion=0, maxWireVersion=4, maxDocumentSize=16777216, roundTripTimeNanos=1433066}
Please enter your selection: 2
Sep 26, 2016 5:32:04 PM com.mongodb.diagnostics.logging.JULLogger log
INFO: Opened connection [connectionId{localValue:2, serverValue:4}] to localhost:27017
Exception in thread "main" java.lang.UnsupportedOperationException: Not supported yet.
at twitterapplication.TwitterTask.getTweetByQuery(TwitterTask.java:267)
at twitterapplication.TwitterTask.getTopRetweet(TwitterTask.java:180)
at twitterapplication.TwitterTask.loadMenu(TwitterTask.java:75)
at twitterapplication.TwitterTask.main(TwitterTask.java:51)
C:\Users\user\AppData\Local\NetBeans\Cache\8.1\executor-snippets\run.xml:53: Java returned: 1
BUILD FAILED (total time: 6 seconds)

Unknown said...
This comment has been removed by the author.
Anonymous said...

If you want to know everything about Facebook posting shelude, you should visit our website https://promorepublic.com/en/why-to-schedule-posts-on-facebook.php