About
On the 29th of July 2018, Emma Best published on her website the copy of 11k+ wikileaks Twitter DM : https://emma.best/2018/07/29/11000-messages-from-private-wikileaks-chat-released/
Here is a data extraction and wrangling of this corpus, to make it easily searchable, extractable and sharable.
How to use this page
- Every “link.csv” is a downloadable csv.
- You can search and order every table. Results of the search are downloadable as csv or can be copied in the clipboard.
- You can zoom in the time series by selecting the date range. You can also use the selector beside to choose this range. Double click to reset the settings.
- Under each dynamic plot, you can find a static plot by clicking on “Static plost”.
This page may not work as expected on Internet Explorer / Edge. Please switch to another browser if you have trouble reading this page.
Data format
- Every csv is encoded in UTF8
- You can find these csv in JSON format on the GitHub repo
Browse through the content
- Home has the full dataset, to search and download.
- Timeline has a series of time-related content: notably DMs by years, and daily count of DMs.
- Users holds the dataset for each users.
- mentions_urls holds the extracted mentions and urls
- methodo contains the methodology used for the data wrangling
Count of daily DMs
A dataset with 2 columns
- date: the date
- n: number of DMs
Static plot
DMs by year
3 datasets (1 per year), each with 3 columns:
- text: extracted text
- date: date of the dm
- user: user who sent the dm
2015
Static plot
2016
Static plot
2017
Static plot
Methodology
Everything has been done in R.
Methodology is described in methodo