A quick #WorldEmojiDay exploration
Letโs celebrate #WorldEmojiDay with a quick exploration of my own twitter account.
The ๐ฆ
Weโll need:
From Github
{emo}
remote::install_github("hadley/emo")
From CRAN
{dplyr}
{tidyr}
{rtweet}
{tidytext}
Note: This page has been created at:
Sys.time()
## [1] "2018-07-17 17:22:29 CEST"
The ๐
Letโs get my last 3200 tweets:
library(emo)
library(rtweet)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
res <- get_timeline(
"_ColinFay",
n = 3200
)
names(res)
## [1] "user_id" "status_id"
## [3] "created_at" "screen_name"
## [5] "text" "source"
## [7] "display_text_width" "reply_to_status_id"
## [9] "reply_to_user_id" "reply_to_screen_name"
## [11] "is_quote" "is_retweet"
## [13] "favorite_count" "retweet_count"
## [15] "hashtags" "symbols"
## [17] "urls_url" "urls_t.co"
## [19] "urls_expanded_url" "media_url"
## [21] "media_t.co" "media_expanded_url"
## [23] "media_type" "ext_media_url"
## [25] "ext_media_t.co" "ext_media_expanded_url"
## [27] "ext_media_type" "mentions_user_id"
## [29] "mentions_screen_name" "lang"
## [31] "quoted_status_id" "quoted_text"
## [33] "quoted_created_at" "quoted_source"
## [35] "quoted_favorite_count" "quoted_retweet_count"
## [37] "quoted_user_id" "quoted_screen_name"
## [39] "quoted_name" "quoted_followers_count"
## [41] "quoted_friends_count" "quoted_statuses_count"
## [43] "quoted_location" "quoted_description"
## [45] "quoted_verified" "retweet_status_id"
## [47] "retweet_text" "retweet_created_at"
## [49] "retweet_source" "retweet_favorite_count"
## [51] "retweet_retweet_count" "retweet_user_id"
## [53] "retweet_screen_name" "retweet_name"
## [55] "retweet_followers_count" "retweet_friends_count"
## [57] "retweet_statuses_count" "retweet_location"
## [59] "retweet_description" "retweet_verified"
## [61] "place_url" "place_name"
## [63] "place_full_name" "place_type"
## [65] "country" "country_code"
## [67] "geo_coords" "coords_coords"
## [69] "bbox_coords" "status_url"
## [71] "name" "location"
## [73] "description" "url"
## [75] "protected" "followers_count"
## [77] "friends_count" "listed_count"
## [79] "statuses_count" "favourites_count"
## [81] "account_created_at" "verified"
## [83] "profile_url" "profile_expanded_url"
## [85] "account_lang" "profile_banner_url"
## [87] "profile_background_url" "profile_image_url"
Here is what the text
column looks like:
res %>%
pull(text) %>%
.[1:5]
## [1] "@GoldbergData It adds a little label at the top left with the text you provide. \nCan be useful if you want to add some legends in a markdown / shiny app, for example"
## [2] "#RStats \nCool new feature in ggplot2 v3 โ tagging plots : https://t.co/jFUqX2Tj5T"
## [3] "#RStats โ A perfect introduction to \U0001f5fa with the {sf} \U0001f4e6 & Co by @statnmap : \nhttps://t.co/IrmcSBDMDy https://t.co/m3TyUjrxYF"
## [4] "@vsbuffalo Amen to that"
## [5] "#RStats โ \U0001f680 Setting up RStudio Server, Shiny Server and PostgreSQL :\nhttps://t.co/J1Y7edNAj0"
As you can see, the emojis are not printed in the console, but converted
to weird characters like \U0001f4e6
and such. These are unicode
characters: translations of the emojis into a language your machine can
understand. I wonโt go deeper into this, here are two resources you can
read if you want to know more about encoding:
The ๐
Letโs use the {emo}
package to extract the emojis from the text.
Inspired by {stringr}
, this package has a ji_extract_all
function
that is designed to extract all the emojis from a character vector.
Weโll use it on out text column, then extract the date and emo column.
We then pass the result to tidyr::unnest
in order to remove the empty
emo rows (i.e, the tweets without an emoji).
library(tidyr)
emos <- res %>%
mutate(
emo = ji_extract_all(text)
) %>%
select(created_at,emo) %>%
unnest(emo)
emos
## # A tibble: 887 x 2
## created_at emo
## <dttm> <chr>
## 1 2018-07-17 10:00:47 ๐ฆ
## 2 2018-07-17 08:35:05 ๐
## 3 2018-07-16 18:47:25 ๐ฎ
## 4 2018-07-16 14:51:30 ๐
## 5 2018-07-16 14:51:16 ๐ฑ
## 6 2018-07-16 13:28:08 ๐
## 7 2018-07-16 13:27:00 ๐
## 8 2018-07-16 13:27:00 ๐ฒ
## 9 2018-07-16 13:27:00 ๐
## 10 2018-07-16 13:25:01 ๐
## # ... with 877 more rows
emos %>%
count(emo, sort = TRUE)
## # A tibble: 187 x 2
## emo n
## <chr> <int>
## 1 ๐ค 84
## 2 ๐ฆ 56
## 3 ๐ฌ 51
## 4 ๐ 50
## 5 ๐ฑ 50
## 6 ๐ 42
## 7 ๐ 36
## 8 ๐ 35
## 9 ๐ 33
## 10 ๐ 28
## # ... with 177 more rows
So apparently, I use a lot of ๐ค. But also talk about ๐ฆ, which sounds more appropriate :)
As you can see, {tibble}
converts elements to emojis when printing.
When using a data.frame, you have a simple unicode translation:
emos %>%
as.data.frame() %>%
head()
## created_at emo
## 1 2018-07-17 10:00:47 \U0001f4e6
## 2 2018-07-17 08:35:05 \U0001f680
## 3 2018-07-16 18:47:25 \U0001f62e
## 4 2018-07-16 14:51:30 \U0001f601
## 5 2018-07-16 14:51:16 \U0001f631
## 6 2018-07-16 13:28:08 \U0001f352
The ๐ท
Letโs flag all the emojis with their names:
emos %>%
left_join(
data.frame(
emo = ji_name,
name = names(ji_name)
)
) %>%
count(emo, name, sort = TRUE)
## Joining, by = "emo"
## Warning: Column `emo` joining character vector and factor, coercing into
## character vector
## # A tibble: 295 x 3
## emo name n
## <chr> <fct> <int>
## 1 ๐ค thinking 84
## 2 ๐ค thinking_face 84
## 3 ๐ฆ package 56
## 4 ๐ฌ grimacing 51
## 5 ๐ฌ grimacing_face 51
## 6 ๐ party_popper 50
## 7 ๐ tada 50
## 8 ๐ฑ face_screaming_in_fear 50
## 9 ๐ฑ scream 50
## 10 ๐ innocent 42
## # ... with 285 more rows
The ๐
And finally, letโs see what are the most associated words with the emojis we just saw:
library(tidytext)
emos_with_id <- res %>%
mutate(
emo = ji_extract_all(text)
) %>%
select(status_id, text, emo) %>%
tidyr::unnest(emo)
emos_with_id %>%
unnest_tokens(word,text) %>%
anti_join(stop_words) %>%
anti_join(proustr::stop_words) %>%
anti_join(
data.frame(
word = c("https", "t.co", "https", "gt")
)
) %>%
count(emo, word, sort = TRUE)
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Warning: Column `word` joining character vector and factor, coercing into
## character vector
## # A tibble: 5,660 x 3
## emo word n
## <chr> <chr> <int>
## 1 ๐ฆ rstats 37
## 2 ๐ rstats 27
## 3 ๐ป macbook 26
## 4 ๐ฆ package 20
## 5 ๐ trans 18
## 6 โ pm 15
## 7 ๐ป pro 15
## 8 ๐ป marche 10
## 9 ๐ค ma_salmon 10
## 10 ๐ฑ ma_salmon 10
## # ... with 5,650 more rows
And what are the most used emojis with โrstatsโ?
emos_with_id %>%
unnest_tokens(word, text) %>%
anti_join(stop_words) %>%
anti_join(proustr::stop_words) %>%
anti_join(
data.frame(
word = c("https", "t.co", "https", "gt")
)
) %>%
count(emo, word, sort = TRUE) %>%
filter(
word == "rstats"
)
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"
## Warning: Column `word` joining character vector and factor, coercing into
## character vector
## # A tibble: 81 x 3
## emo word n
## <chr> <chr> <int>
## 1 ๐ฆ rstats 37
## 2 ๐ rstats 27
## 3 ๐ฌ rstats 5
## 4 ๐ rstats 4
## 5 ๐ rstats 4
## 6 ๐ค rstats 4
## 7 โ๏ธ rstats 3
## 8 ๐ rstats 3
## 9 ๐ rstats 3
## 10 โก rstats 2
## # ... with 71 more rows
Other cool functions
I recently discovered the ji_glue()
function which allows you to
insert an emoji easily into a character vector :
ji_glue("I love to code :package:")
## I love to code ๐ฆ
ji_glue("Sometimes they make me :scream:")
## Sometimes they make me ๐ฑ
ji_glue("Sometimes they make me :cry:")
## Sometimes they make me ๐ข
ji_glue("Sometimes they make me :fear:")
## Sometimes they make me ๐จ
ji_glue("But in the end I'm always :tada:")
## But in the end I'm always ๐
The ji()
function can also be used inside your markdown, so you can
write:
โI hate backtick r emo::ji(โbugโ) backtickโ, and it will come as: โI hate ๐โ.
(of course, replace backtick by actuwith backticks :) ).
Thatโs all folks ๐ฌ
Thatโs all for today! Now have a nice emoji day ๐
What do you think?