Predictive Hacks

How to Get Twitter Data using R

In a previous post, we showed how to get Twitter data using Python. In this tutorial, we will show you how to get Twitter data using R and more particularly with the rtweet library. As we have explained in the previous post, you will need to create a developer account and get your consumer and access keys respectively.

Install the rtweet Library and API Authorization

We can install the rtweet library either from CRAN or from GitHub as follows:

## install rtweet from CRAN
install.packages("rtweet")


## OR
## install remotes package if it's not already
if (!requireNamespace("remotes", quietly = TRUE)) {
  install.packages("remotes")
}

## install dev version of rtweet from github
remotes::install_github("ropensci/rtweet")

Then we are ready to load the library:

## load rtweet package
library(rtweet)

We can authorize the API by entering our credentials:

How to Get Twitter Data using R 1
How to Get Twitter Data using R 2
## load rtweet
library(rtweet)

## load the tidyverse
library(tidyverse)

## store api keys (these are fake example values; replace with your own keys)
api_key <- "afYS4vbIlPAj096E60c4W1fiK"
api_secret_key <- "bI91kqnqFoNCrZFbsjAWHD4gJ91LQAhdCJXCj3yscfuULtNkuu"

## authenticate via web browser
token <- create_token(
  app = "rstatsjournalismresearch",
  consumer_key = api_key,
  consumer_secret = api_secret_key)

get_token()
 

Search Tweets

Now we are ready to get our first data from Twitter, starting with 1000 tweets containing the hashtag “#DataScience” by excluding the re-tweets. Note that the “search_tweets” returns data from the last 6-9 days

rt <- search_tweets("#DataScience", n = 1000, include_rts = FALSE)
View(rt)
 
How to Get Twitter Data using R 3

Usually, we want to get the screen name, the text, the number of likes and the number of re-tweets.

View(rt%>%select(screen_name, text, favorite_count, retweet_count))
 
How to Get Twitter Data using R 4

The rt object consists of 90 columns, so as you can understand it contains a lot of information. Note that there is a limit of 18000 tweets per 15 minutes. If we want to get more, we should do the following:

## search for 250,000 tweets containing the word datascience
rt <- search_tweets(
  "datascience", n = 250000, retryonratelimit = TRUE
)
 

We can also plot the tweets:

ts_plot(rt, "hour") 
 
How to Get Twitter Data using R 5

Stream Tweets

Stream tweets return a random sample of approximately 1% of the live stream of all tweets, having the option to filter the data via a search-like query, or enter specific user ids, or define the geo-coordinates of the tweets.

## stream tweets containing the toke DataScience for 30 seconds
rt <- stream_tweets("DataScience", timeout = 30)
View(rt)
 
How to Get Twitter Data using R 6

Note that the rt object consists of 90 columns.

Filter Tweets

When we search for tweets, we can also apply some filters. For example, let’s say that we want to specify the language or to exclude retweets and so on. Let’s search for tweets that contain the “StandWithUkraine” text by excluding retweets, quotes and replies.

ukraine <- search_tweets("StandWithUkraine-filter:retweets -filter:quote -filter:replies", n = 100)
View(ukraine)
 
How to Get Twitter Data using R 7

Finally, let’s search for tweets in English that contain the word “Putin” and have more than 100 likes and retweets.

putin<-search_tweets("Putin min_retweets:100 AND min_faves:100", lang='en')
View(putin)
 
How to Get Twitter Data using R 8

Get Timelines

The get_imeline() function returns up to 3,200 statuses posted to the timelines of each of one or more specified Twitter users. Let’s get the timeline of R-bloggers and Hadley Wickham accounts by taking into consideration the 500 most recent tweets of each one.

rt<-get_timeline(user=c("Rbloggers", "hadleywickham"), n=500)
 

Whose tweets are more popular? Who gets more likes and retweets? Let’s answer this question.

rt%>%group_by(screen_name)%>%
     summarise(AvgLikes = mean(favorite_count, na.rm=TRUE), AvgRetweets = mean(retweet_count, na.rm=TRUE))
 

As we can see, the guru Hadley Wickham gets much more likes and retweets compared to R-bloggers.

# A tibble: 2 x 3
  screen_name   AvgLikes AvgRetweets
1 hadleywickham     31.6     110.  
2 Rbloggers         17.8     6.17

Get the Likes made by a User

We can get the n most recently liked tweets by a user. Let’s see what Hadley likes!

hadley <- get_favorites("hadleywickham", n = 1000)
 

Which are his favorite accounts?

hadley%>%group_by(screen_name)%>%count()%>%arrange(desc(n))%>%head(5)
 

And we get:

# A tibble: 5 x 2
# Groups:   screen_name [5]
  screen_name       n
1 djnavarro        17
2 ijeamaka_a       13
3 mjskay           12
4 sharlagelfand    12
5 vboykis          12

Which are his favorite hashtags?

hadley%>%unnest(hashtags)%>%group_by(tolower(hashtags))%>%count()%>%
      arrange(desc(n))%>%na.omit()%>%head(5)
 

And we get:

# A tibble: 5 x 2
# Groups:   tolower(hashtags) [5]
  `tolower(hashtags)`     n
1 rstats                 78
2 rtistry                22
3 genuary2022             9
4 genuary                 7
5 rayrender               6

Search Users

We can search for users with a specific hashtag. For example:

## search for up to 1000 users using the keyword rstats
rstats <- search_users(q = "rstats", n = 1000)

## data frame where each observation (row) is a different user
rstats

## tweets data also retrieved. can access it via tweets_data()
tweets_data(rstats)
 

Get Friends and Followers

We can get the user ids of the friends and followers of a specific account. Let’s get the friends and followers of the Predictive Hacks account.

get_followers("predictivehacks")
get_friends("predictivehacks")
 

We can extract more info on the user ids using the function:

lookup_users(users, parse = TRUE, token = NULL)
 

where users is a user id or screen name of the target user.

Get Trends

We can get the trends in a particular city or country. For example:

## Retrieve available trends
trends <- trends_available()
trends

## Store WOEID for Worldwide trends
worldwide <- trends$woeid[grep("world", trends$name, ignore.case = TRUE)[1]]

## Retrieve worldwide trends datadata
ww_trends <- get_trends(worldwide)

## Preview trends data
ww_trends

## Retrieve trends data using latitude, longitude near New York City
nyc_trends <- get_trends_closest(lat = 40.7, lng = -74.0)

## should be same result if lat/long supplied as first argument
nyc_trends <- get_trends_closest(c(40.7, -74.0))

## Preview trends data
nyc_trends

## Provide a city or location name using a regular expression string to
## have the function internals do the WOEID lookup/matching for you
(luk <- get_trends("london"))
 

Let’s see what is trending in Russia and in Ukraine, respectively.

get_trends("Russia")
get_trends("Ukraine")
 
How to Get Twitter Data using R 9
How to Get Twitter Data using R 10

Final Thoughts

Every Data Science analysis starts with the data. In this tutorial, we showed how to get Twitter data in R. In the next posts, we will show you how to analyze this data by applying sentiment analysis, topic modelling, network analysis and so on. Stay tuned!

Share This Post

Share on facebook
Share on linkedin
Share on twitter
Share on email

Leave a Comment

Subscribe To Our Newsletter

Get updates and learn from the best

More To Explore

Python

Get Started with Hugging Face Auto Train

Hugging Face has launched the auto train, which is a new way to automatically train, evaluate and deploy state-of-the-art Machine