StormTweetsSentimentD3UKViz

Created: 2014-05-19 07:19
Updated: 2017-12-19 22:02
License: apache-2.0

README.md

StormTweetsSentimentD3UKViz


You might also be interested in checking out my other project, Twitter sentiment of States of US on a D3.js Choropleth Map on StormTweetsSentimentD3Viz.

Introduction

This repository contains an application which is built to demonstrate as an example of Storm distributed framework by performing sentiment analysis of tweets originating from U.K. in real-time. This Topology retrieves tweets originating from UK and computes and visualizes the sentiment scores of each of the county / region of United States [based on tweets] in a Choropleth Map using D3.js continuously for 10 minutes [in local mode]. User can also explicitly kill the topology by pressing Ctrl+C for exiting the application.

Apache Storm is an open source distributed real-time computation system, developed at BackType by Nathan Marz and team. It has been open sourced by Twitter [post BackType acquisition] in August, 2011. And became a top level project in Apache on 29th September, 2014.
This application has been developed and tested with Storm v0.8.2 on Windows 7 in local mode; and was eventually updated and tested with Storm v0.9.3 on 22nd January, 2015. Application may or may not work with earlier or later versions than Storm v0.9.3.

This application has been tested in:

  • Local mode on a Ubuntu virtual machine and even on Microsoft Windows 7 machine.
  • Cluster mode on a private cluster and also on Amazon EC2 environment of 4 machines and 5 machines respectively; with all the machines in private cluster running Ubuntu while EC2 environment machines were powered by CentOS.
    • Recent update to Apache Storm v0.9.3 has not been tested in a Cluster mode.

Features

  • Application retrieves tweets using Twitter Streaming API (using Twitter4J).
  • It analyses sentiments of all the tweets originating from UK [based on latlong bounding box].
  • There are three different objects within a tweet that we can use to determine it’s origin. This application tries to find the location using all the three options and prioritizes location received in the following order [high to low]:
    • The coordinates object. -- I solely rely on the coordinates object of a tweet in this project.
    • The place object. -- Not considered in this project.
    • The user object. -- Not considered in this project.
  • For reverse geocoding, this application uses Google Maps API.
    • For more information and sign up, please check Google Maps API page.
    • Please note that you would need Google account for signing up for Google Maps API key.
      • Also, for previous reverse geocoding experiments, I chose Bing Maps and not Google Maps since Google Maps is too restrictive for our usage, as it has a limit of only 2500 requests per day and Bing allows 50k requests per day.
      • But Bing's geocoding for UK is really not upto the mark. Google Maps API is far better and translates almost every latlong request to a place. So, I have gone with Google Maps API for this project due to the greater accuracy.
      • Just for the perspective, Bing Maps could convert only 3 out of 400 latlong requests to a location. While Google Maps API converted almost 95% of the latlong requests to a location.
  • This application uses AFINN which contains a list of pre-computed sentiment scores.
    • These words are used to determine sentiment of the each tweet which is retrieved using Streaming API.
  • By understanding sentiment values, we can get the most happiest county / region of UK and most unhappiest county / region as well.
  • For visualization, I am using D3 to display the sentiment value of each county / region in real-time by conveying it in a color, appropriate to the sentiment value. Color of the county / region moves from Red to green, as the sentiment value decreases.
  • This codebase has been updated with decent comments, wherever required.
  • Also this project has been made compatible with both Eclipse IDE and IntelliJ IDEA. Import the project in your favorite IDE [which has Maven plugin installed] and you can quickly follow the code.

Note: Huge thanks to my colleague Ganesh Sastry for his help on generating the TopoJSON of UK and the UK map right.

Demo of UK Twitter Sentiment Visualization

GIF of D3 Choropleth Visualization

GIF animation of D3 Visualization

Screenshot of D3 Choropleth Visualization

Screenshot of D3 Visualization

Configuration

  • Please check the config.properties and add your own values and complete the integration of Twitter API to your application by looking at your values from Twitter Developer Page.
    • If you did not create a Twitter App before, then please create a new Twitter App where you will get all the required values of config.properties afresh and then populate them here without any mistake.
  • Also please add the value of Google Maps API Key to config.properties, as that will be used for getting the reverse geocode location using Latitude and Longitude.
  • And finally please check [but do not modify] the AFINN-111.txt file to see the pre-computed sentiment scores of ~2500 words / phrases.

Dependencies

  • Storm v0.9.3
  • Jackson v1.9.13
  • Spring v4.0.3
  • Camel v2.13.0
  • ActiveMQ Camel v5.9.0
  • Twitter4J v4.0.2
  • Google Guava v18.0
  • Logback v1.1.2

Also, please check pom.xml for more information on the various other dependencies of the project.

Requirements

This project uses Maven to build and run the topology.
You need the following on your machine:

  • Oracle JDK >= 1.8.x
  • Apache Maven >= 3.2.5
  • Python v2.7.x installed on the machine for triggering the visualization.
  • Clone this repo and import as an existing Maven project to either Eclipse IDE or IntelliJ IDEA.
  • This application uses Google Guava for making life simple while using Collections and other generic stuff.
  • This application also uses Jackson for unmarshalling the JSON response got from Google Maps API.
  • Requires ZooKeeper, etc installed and configured in case of executing this project in distributed mode i.e. Storm Cluster.
    • Follow the steps mentioned on Storm Wiki for more details on setting up a Storm Cluster.

Rest of the required frameworks and libraries are downloaded by Maven as required in the build process, the first time the Maven build is invoked.

Usage

To build and run this topology, you must use Java 1.8.

Local Mode:

  • All the required frameworks and libraries are downloaded by Maven as required.
  • Local mode can also be run on Windows environment without installing any specific software or framework as such. Note: Please be sure to clear your temp folder as it adds lot of temporary files in every run.
  • In local mode, this application can be run from command line by invoking:

mvn clean compile exec:java -Dexec.classpathScope=compile -Dexec.mainClass=org.p7h.storm.sentimentanalysis.topology.SentimentAnalysisTopology

or

mvn clean compile package && java -jar target/storm-sentiment-uk-viz-0.1-jar-with-dependencies.jar

  • Start Python SimpleHTTPServer in the web folder of this code repo.

python -m SimpleHTTPServer

  • For D3 Choropleth Map visualization, launch a browser [preferably Google Chrome] and point to index.html hosted on the above Python server.
    • Click on "Start Viz" button to trigger the D3 Choropleth Map visualization.
    • You can stop the visualization anytime by clicking on "Stop Viz" button.
    • This Map updates as and when a tweet is analyzed by Storm and displays in real-time, visualization of the sentiment value of each of the county / region of United States of America.

Distributed [or Cluster / Production] Mode:

Distributed mode requires a complete and proper Storm Cluster setup. Please check wiki on Apache Storm website for setting up a Storm Cluster.
In distributed mode, after starting Nimbus and Supervisors on individual machines, this application can be executed on the master [or Nimbus] machine by invoking the following on the command line:

storm jar target/storm-sentiment-uk-viz-0.1.jar org.p7h.storm.sentimentanalysis.topology.SentimentAnalysisTopology SentimentAnalysis

#####Note As mentioned earlier, repo's recent update to Storm v0.9.3 was not tested in Cluster mode. But it should work as before, if the cluster setup is all fine.

Problems

If you find any issues, please report them either raising an issue here on GitHub or alert me on my Twitter handle @P7h. Or even better, please send a pull request. Appreciate your help. Thanks!

License

Copyright © 2013-2015 Prashanth Babu.
Licensed under the Apache License, Version 2.0.

Cookies help us deliver our services. By using our services, you agree to our use of cookies Learn more