A quick #WorldEmojiDay exploration

6 minute read

Letโ€™s celebrate #WorldEmojiDay with a quick exploration of my own twitter account.

The ๐Ÿ“ฆ

Weโ€™ll need:

From Github

  • {emo}
remote::install_github("hadley/emo")

From CRAN

  • {dplyr}
  • {tidyr}
  • {rtweet}
  • {tidytext}

Note: This page has been created at:

Sys.time()
## [1] "2018-07-17 17:22:29 CEST"

The ๐Ÿ”

Letโ€™s get my last 3200 tweets:

library(emo)
library(rtweet)
library(dplyr)
## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
res <- get_timeline(
  "_ColinFay",
  n = 3200
)
names(res)
##  [1] "user_id"                 "status_id"              
##  [3] "created_at"              "screen_name"            
##  [5] "text"                    "source"                 
##  [7] "display_text_width"      "reply_to_status_id"     
##  [9] "reply_to_user_id"        "reply_to_screen_name"   
## [11] "is_quote"                "is_retweet"             
## [13] "favorite_count"          "retweet_count"          
## [15] "hashtags"                "symbols"                
## [17] "urls_url"                "urls_t.co"              
## [19] "urls_expanded_url"       "media_url"              
## [21] "media_t.co"              "media_expanded_url"     
## [23] "media_type"              "ext_media_url"          
## [25] "ext_media_t.co"          "ext_media_expanded_url" 
## [27] "ext_media_type"          "mentions_user_id"       
## [29] "mentions_screen_name"    "lang"                   
## [31] "quoted_status_id"        "quoted_text"            
## [33] "quoted_created_at"       "quoted_source"          
## [35] "quoted_favorite_count"   "quoted_retweet_count"   
## [37] "quoted_user_id"          "quoted_screen_name"     
## [39] "quoted_name"             "quoted_followers_count" 
## [41] "quoted_friends_count"    "quoted_statuses_count"  
## [43] "quoted_location"         "quoted_description"     
## [45] "quoted_verified"         "retweet_status_id"      
## [47] "retweet_text"            "retweet_created_at"     
## [49] "retweet_source"          "retweet_favorite_count" 
## [51] "retweet_retweet_count"   "retweet_user_id"        
## [53] "retweet_screen_name"     "retweet_name"           
## [55] "retweet_followers_count" "retweet_friends_count"  
## [57] "retweet_statuses_count"  "retweet_location"       
## [59] "retweet_description"     "retweet_verified"       
## [61] "place_url"               "place_name"             
## [63] "place_full_name"         "place_type"             
## [65] "country"                 "country_code"           
## [67] "geo_coords"              "coords_coords"          
## [69] "bbox_coords"             "status_url"             
## [71] "name"                    "location"               
## [73] "description"             "url"                    
## [75] "protected"               "followers_count"        
## [77] "friends_count"           "listed_count"           
## [79] "statuses_count"          "favourites_count"       
## [81] "account_created_at"      "verified"               
## [83] "profile_url"             "profile_expanded_url"   
## [85] "account_lang"            "profile_banner_url"     
## [87] "profile_background_url"  "profile_image_url"

Here is what the text column looks like:

res %>% 
  pull(text) %>%
  .[1:5]
## [1] "@GoldbergData It adds a little label at the top left with the text you provide. \nCan be useful if you want to add some legends in a markdown / shiny app, for example"
## [2] "#RStats \nCool new feature in ggplot2 v3 โ€” tagging plots : https://t.co/jFUqX2Tj5T"                                                                                    
## [3] "#RStats โ€” A perfect introduction to \U0001f5fa with the {sf} \U0001f4e6 &amp; Co by @statnmap : \nhttps://t.co/IrmcSBDMDy https://t.co/m3TyUjrxYF"                     
## [4] "@vsbuffalo Amen to that"                                                                                                                                               
## [5] "#RStats โ€” \U0001f680 Setting up RStudio Server, Shiny Server and PostgreSQL :\nhttps://t.co/J1Y7edNAj0"

As you can see, the emojis are not printed in the console, but converted to weird characters like \U0001f4e6 and such. These are unicode characters: translations of the emojis into a language your machine can understand. I wonโ€™t go deeper into this, here are two resources you can read if you want to know more about encoding:

The ๐Ÿ“Š

Letโ€™s use the {emo} package to extract the emojis from the text. Inspired by {stringr}, this package has a ji_extract_all function that is designed to extract all the emojis from a character vector. Weโ€™ll use it on out text column, then extract the date and emo column. We then pass the result to tidyr::unnest in order to remove the empty emo rows (i.e, the tweets without an emoji).

library(tidyr)
emos <- res %>%
  mutate(
    emo = ji_extract_all(text)
  ) %>%
  select(created_at,emo) %>%
  unnest(emo)

emos
## # A tibble: 887 x 2
##    created_at          emo  
##    <dttm>              <chr>
##  1 2018-07-17 10:00:47 ๐Ÿ“ฆ   
##  2 2018-07-17 08:35:05 ๐Ÿš€   
##  3 2018-07-16 18:47:25 ๐Ÿ˜ฎ   
##  4 2018-07-16 14:51:30 ๐Ÿ˜   
##  5 2018-07-16 14:51:16 ๐Ÿ˜ฑ   
##  6 2018-07-16 13:28:08 ๐Ÿ’   
##  7 2018-07-16 13:27:00 ๐Ÿ˜ˆ   
##  8 2018-07-16 13:27:00 ๐ŸŒฒ   
##  9 2018-07-16 13:27:00 ๐Ÿ’€   
## 10 2018-07-16 13:25:01 ๐Ÿ›   
## # ... with 877 more rows
emos %>%
  count(emo, sort = TRUE)
## # A tibble: 187 x 2
##    emo       n
##    <chr> <int>
##  1 ๐Ÿค”       84
##  2 ๐Ÿ“ฆ       56
##  3 ๐Ÿ˜ฌ       51
##  4 ๐ŸŽ‰       50
##  5 ๐Ÿ˜ฑ       50
##  6 ๐Ÿ˜‡       42
##  7 ๐Ÿ˜       36
##  8 ๐Ÿ™ƒ       35
##  9 ๐Ÿ˜‚       33
## 10 ๐Ÿ˜œ       28
## # ... with 177 more rows

So apparently, I use a lot of ๐Ÿค”. But also talk about ๐Ÿ“ฆ, which sounds more appropriate :)

As you can see, {tibble} converts elements to emojis when printing. When using a data.frame, you have a simple unicode translation:

emos %>%
  as.data.frame() %>%
  head()
##            created_at        emo
## 1 2018-07-17 10:00:47 \U0001f4e6
## 2 2018-07-17 08:35:05 \U0001f680
## 3 2018-07-16 18:47:25 \U0001f62e
## 4 2018-07-16 14:51:30 \U0001f601
## 5 2018-07-16 14:51:16 \U0001f631
## 6 2018-07-16 13:28:08 \U0001f352

The ๐Ÿท

Letโ€™s flag all the emojis with their names:

emos %>%
  left_join(
    data.frame(
      emo = ji_name, 
      name = names(ji_name)
    )
  ) %>% 
  count(emo, name, sort = TRUE)
## Joining, by = "emo"

## Warning: Column `emo` joining character vector and factor, coercing into
## character vector

## # A tibble: 295 x 3
##    emo   name                       n
##    <chr> <fct>                  <int>
##  1 ๐Ÿค”    thinking                  84
##  2 ๐Ÿค”    thinking_face             84
##  3 ๐Ÿ“ฆ    package                   56
##  4 ๐Ÿ˜ฌ    grimacing                 51
##  5 ๐Ÿ˜ฌ    grimacing_face            51
##  6 ๐ŸŽ‰    party_popper              50
##  7 ๐ŸŽ‰    tada                      50
##  8 ๐Ÿ˜ฑ    face_screaming_in_fear    50
##  9 ๐Ÿ˜ฑ    scream                    50
## 10 ๐Ÿ˜‡    innocent                  42
## # ... with 285 more rows

The ๐Ÿ” 

And finally, letโ€™s see what are the most associated words with the emojis we just saw:

library(tidytext)
emos_with_id <- res %>%
  mutate(
    emo = ji_extract_all(text)
  ) %>% 
  select(status_id, text, emo) %>%
  tidyr::unnest(emo)

emos_with_id %>%
  unnest_tokens(word,text) %>%
  anti_join(stop_words) %>%
  anti_join(proustr::stop_words) %>%
  anti_join(
    data.frame(
      word = c("https", "t.co", "https", "gt")
    )
  ) %>% 
  count(emo, word, sort = TRUE)
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"

## Warning: Column `word` joining character vector and factor, coercing into
## character vector

## # A tibble: 5,660 x 3
##    emo   word          n
##    <chr> <chr>     <int>
##  1 ๐Ÿ“ฆ    rstats       37
##  2 ๐ŸŽ‰    rstats       27
##  3 ๐Ÿ’ป    macbook      26
##  4 ๐Ÿ“ฆ    package      20
##  5 ๐Ÿ‘    trans        18
##  6 โ˜•    pm           15
##  7 ๐Ÿ’ป    pro          15
##  8 ๐Ÿ’ป    marche       10
##  9 ๐Ÿค”    ma_salmon    10
## 10 ๐Ÿ˜ฑ    ma_salmon    10
## # ... with 5,650 more rows

And what are the most used emojis with โ€œrstatsโ€?

emos_with_id %>%
  unnest_tokens(word, text) %>%
  anti_join(stop_words) %>%
  anti_join(proustr::stop_words) %>%
  anti_join(
    data.frame(
      word = c("https", "t.co", "https", "gt")
    )
  ) %>%
  count(emo, word, sort = TRUE) %>%
  filter(
    word == "rstats"
  )
## Joining, by = "word"
## Joining, by = "word"
## Joining, by = "word"

## Warning: Column `word` joining character vector and factor, coercing into
## character vector

## # A tibble: 81 x 3
##    emo   word       n
##    <chr> <chr>  <int>
##  1 ๐Ÿ“ฆ    rstats    37
##  2 ๐ŸŽ‰    rstats    27
##  3 ๐Ÿ˜ฌ    rstats     5
##  4 ๐ŸŒŸ    rstats     4
##  5 ๐Ÿ‘Œ    rstats     4
##  6 ๐Ÿค”    rstats     4
##  7 โœ๏ธ    rstats     3
##  8 ๐Ÿ’Ž    rstats     3
##  9 ๐Ÿ™Œ    rstats     3
## 10 โšก    rstats     2
## # ... with 71 more rows

Other cool functions

I recently discovered the ji_glue() function which allows you to insert an emoji easily into a character vector :

ji_glue("I love to code :package:")
## I love to code ๐Ÿ“ฆ
ji_glue("Sometimes they make me :scream:")
## Sometimes they make me ๐Ÿ˜ฑ
ji_glue("Sometimes they make me :cry:")
## Sometimes they make me ๐Ÿ˜ข
ji_glue("Sometimes they make me :fear:")
## Sometimes they make me ๐Ÿ˜จ
ji_glue("But in the end I'm always :tada:")
## But in the end I'm always ๐ŸŽ‰

The ji() function can also be used inside your markdown, so you can write:

โ€œI hate backtick r emo::ji(โ€bugโ€œ) backtickโ€, and it will come as: โ€œI hate ๐Ÿ›โ€.

(of course, replace backtick by actuwith backticks :) ).

Thatโ€™s all folks ๐ŸŽฌ

Thatโ€™s all for today! Now have a nice emoji day ๐ŸŽ‰

Categories:

Updated:

Leave a Comment