Twitterbot in R and NEO4j (Part 1/3)


One of the best perks of working in a data science and analytics company is opportunity to learn new things. The industry is changing so rapidly that the freedom to learn new things is not just a bonus, but a necessity. In fact, as the era of algorithm-driven business dawns, the most value will be reaped from those business processes where machine learning and the use of real data science are commonplace.

Introducing the Knowledge Building blog series. This series will showcase tips and tricks, introductions to new tools and software that you can use and anything else that we have come across whilst working on an internal or external project.


This three part blog series will demonstrate how you can use R to get data from Twitter and other social media websites through their public APIs, how to store that data in Neo4j (which is particularly well suited to modelling relationships and networks), and how to analyze that data in R and build algorithms to predict what future data might look like.

To add something extra on a typical how-to blog post, I will share my experience of using these packages, including what potential problems you might face and how to overcome them. The Twitter API is quite temperamental to connect to at time (well, it is not always straight forward) so I will hope to impart some new information as well as what I have learned so far.

1. Complete an online application.

This part is quite straightforward, even if you have very little experience connecting to a data API or writing applications. You can even use a normal Twitter account to get started. Visit and log in with your normal Twitter details if you have them. You should then see a button that says Create New App – click it and fill out the details similar to below.

twitbot1 e1454063833672

Tip: set the callback URL to 127.0.0:1410. This is recommended by the twitteR app and improves the chances of some functions in the twitteR of working properly. We will cover those shortly.

Once you have registered your details, you will see a tab containing your access tokens and secrets. You will need to make a note of these for later, since you need to authenticate from your R environment.

2. Create your authenticated R environment.

Run this code in R to get started:

api_key <- "replace-me-with-your-real-api-key"
api_secret <- "replace-me-with-your-real-api-secret"
access_token <- "replace-me-with-your-real-access-token"
access_token_secret <- "replace-me-with-your-real-token-secret"

# finish off authentication
myapp <- oauth_app("twitter", key = api_key, secret = api_secret)
twitter_token <- oauth1.0_token(oauth_endpoints("twitter"), myapp)
setup_twitter_oauth(api_key, api_secret, access_token, access_token_secret)

The twitteR app is the main app we will be using to make requests to the Twitter servers and to authenticate yourself to use them from your R console. The httr is a separate package that we will also use for authentication, but it is also capable of making outside data requests to other websites. Next, R will prompt you on whether you’d like to create a local file to allow access between R sessions.

Use a local file to cache OAuth access credentials between R sessions?
1: Yes
2: No

I usually select ‘Yes’ option, which will take you to a browser tab with the following message:

Authentication complete. Please close this page and return to R.

If everything went smoothly, then you should be authenticated inside your R session. R will do all the work now for sending GET requests and authenticating you with each one.

Warning: sometimes the httr package cannot create a connection to the Twitter server and authenticate you. One common error message that I received is this:

Error in httpuv::startServer("", 1410, list(call = listen)) : 
  Failed to create server

This is because the httr package relies on another package called {httpuv}, and this function does not always work. It just means that you cannot use httr to make requests, but you should still be able to use twitteR. I got around this in a number of ways, but now I only use the more stable functions since it requires quite a bit of effort to fix. Some potential fixes included:

  • Reinstalling the httr and httpuv packages. The most recent version is often required for the startServer function to work properly.
  • If you are working in Linux, run sudo lsof -i :1410 to see if there is a process that is using port 1410, as httr assumes that this port is free.
  • Waiting a day or two – I often came back to my scripts a day or two later and found that the problem had resolved itself. Not particularly useful! But its an elusive problem, so I don’t use the httr functions for Twitter since the Twitter API server is a bit unstable at times.

3. Sending a tweet, getting some data

Test out your new connection by sending a tweet (or a direct message) from your console:

text <- paste("I sent this from my R console at:", Sys.time())
updateStatus(text) # send a tweet
dmSend(text,"OFrost") # send a direct message


No spamming!

Next, let’s get some data. I want to know collect some information on my profile, so I can use the getUser() function to send a GET request about a person’s profile.

ConsolidataLtd <- getUser('ConsolidataLtd')

twitteR will create an object in your R session called an environment. It converts the resulting JSON into a semi-structured data of different data types. The data you will get out includes the number of tweets you’ve posted, how many followers you have, your Twitter URL, details of your last tweet, and a lot more.

The cooler thing is that it returns a list of ‘methods’ that you can use to get more information. One of those methods is the getFollowerIDs() method, which generates a list of all of your Twitter followers. You can then pass this list into other functions from twitteR, such as lookupUsers(), which will get more information about a user based from their unique ID. The result is a list of your followers and a profile on each:


And here’s some basic code to get out the most relevant bits of information:

# call the function to look up users from their IDs and pass in the list of your followers' IDs.

followers <- lookupUsers(ConsolidataLtd$getFollowerIDs())

# do some clean-up, using the unlist() and lapply() functions to get data out of the lists.

screenName <- unlist(lapply(followers, function(x){x$screenName}))
description <- unlist(lapply(followers, function(x){x$description}))
statuses <- unlist(lapply(followers, function(x){x$statusesCount}))
followersCount <- unlist(lapply(followers, function(x){x$followersCount}))
favoritesCount <- unlist(lapply(followers, function(x){x$favoritesCount}))
friendsCount <- unlist(lapply(followers, function(x){x$friendsCount}))
created <- unlist(lapply(followers, function(x){x$created}))
location <- unlist(lapply(followers, function(x){x$location}))
listedCount <- unlist(lapply(followers, function(x){x$listedCount}))

# create a data frame, reformatting the date/time information and column names as required. 

followers <- data.frame(screenName, description, statuses, followersCount, favoritesCount, friendsCount, location,
 as.Date(as.POSIXct(created, origin = "1970-01-01")),
 strftime(as.POSIXct(created, origin = "1970-01-01"),format = "%H:%M:%S"))
colnames(followers)[8:9] <- c('Date','Time')

And to finish, some quick plot to visualize the data. This graph uses qplot to plot the number of statuses posted versus the number of favorites a typical user has, using Consolidata’s followers as a sample.


qplot(favorites, tweets.posted, data =, colour = popularity)

In the next part, I will show you how to import this data into a graph database (Neo4j using an R package called RNeo4j), examine how Twitter users are related and run some basic-intermediate queries using Cypher.

Latest from this author