Showcase

Focus on YouTube Data Tools

Open this showcase in other interactive and executable environments:

Background

Practical application of the YouTube Data Tool (YTDT) using the example of Mai Thi Nguyen-Kim ( | ) and her -Channel maiLab.

Exercise 1

Use the Channel Search site/function of the YTDT to find the (correct) channel ID for the Channel maiLab.
Therefore, enter “maiLab” in the field Search query and download the results as .csv.
Open the file and search extract the correct channel ID.
Hint: If in doubt, use Channel Info function to check if the selected ID matches the channel description.

# Load packages
library(here)
library(readr)
library(tidyverse)

# Import data
channel_list <- read_csv(
  here("content/05-api_access-youtube/data.local/channelsearch_channels50_2022_11_17-09_54_22.csv"))

# Preview data 
channel_list %>% glimpse()

Rows: 50
Columns: 10
$ position        <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16,…
$ id              <chr> "UCyHDQ5C6z1NDmJ4g6SerW8g", "UC146qqkUMTrn4nfSSOTNwiA"…
$ title           <chr> "maiLab", "musstewissen Chemie", "mailab", "MAILab_메…
$ description     <chr> "Holt euch einen Tee, Freunde der Sonne, macht es euch…
$ publishedAt     <dttm> 2016-09-08 14:13:08, 2016-09-23 09:24:14, 2020-03-19 …
$ defaultLanguage <chr> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ country         <chr> "DE", "DE", "DE", "KR", NA, NA, NA, NA, NA, "DE", NA, …
$ viewCount       <dbl> 127882489, 18039348, 21013, 5101, 11, 2, 2, 5004, 230,…
$ subscriberCount <dbl> 1480000, 190000, 146, 26, 0, 1, 1, 9, 9, 66300, 0, 2, …
$ videoCount      <dbl> 186, 45, 9, 104, 3, 1, 1, 24, 12, 51, 1, 6, 31, 1, 2, …

# Get channel description with R
channel_list %>%
  filter(title == "maiLab") %>%
  select(id, title, description)

# A tibble: 1 × 3
  id                       title  description                                   
  <chr>                    <chr>  <chr>                                         
1 UCyHDQ5C6z1NDmJ4g6SerW8g maiLab Holt euch einen Tee, Freunde der Sonne, macht…

Exercise 2

With help of the Video List site/function of the YTDT, get a list of all published videos of the channel maiLab.
Therefore, use the extracted channel id and download the results as .csv.
Import/preview the data.

# Import data: video list
video_list <- read_csv(
  here("content/05-api_access-youtube/data.local/videolist_channel186_2022_11_17-10_20_11.csv"))

# Preview data 
video_list %>% glimpse()

Rows: 186
Columns: 23
$ position           <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, …
$ channelId          <chr> "UCyHDQ5C6z1NDmJ4g6SerW8g", "UCyHDQ5C6z1NDmJ4g6SerW…
$ channelTitle       <chr> "maiLab", "maiLab", "maiLab", "maiLab", "maiLab", "…
$ videoId            <chr> "IK5BZdnqMDU", "Mt50U4_ueR0", "-NMs56pQ9EE", "-9OvN…
$ publishedAt        <dttm> 2022-09-18 16:00:18, 2022-06-09 04:30:04, 2022-05-…
$ publishedAtSQL     <dttm> 2022-09-18 16:00:18, 2022-06-09 04:30:04, 2022-05-…
$ videoTitle         <chr> "Das Ende der Homöopathie | MAITHINK X", "Affenpock…
$ videoDescription   <chr> "Der vielleicht größte Abwasserskandal aller Zeiten…
$ tags               <chr> "Mai Thi Nguyen-Kim,Mai Thi,mai,nguyen,mailab,lab,m…
$ videoCategoryId    <dbl> 28, 28, 28, 28, 22, 28, 22, 22, 22, 27, 22, 22, 28,…
$ videoCategoryLabel <chr> "Science & Technology", "Science & Technology", "Sc…
$ duration           <chr> "PT31M57S", "PT13M39S", "PT15M21S", "PT12M6S", "PT1…
$ durationSec        <dbl> 1917, 819, 921, 726, 832, 1664, 306, 1065, 1325, 12…
$ dimension          <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, …
$ definition         <chr> "hd", "hd", "hd", "hd", "hd", "hd", "hd", "hd", "hd…
$ caption            <lgl> FALSE, TRUE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, T…
$ thumbnail_maxres   <chr> "https://i.ytimg.com/vi/IK5BZdnqMDU/maxresdefault.j…
$ licensedContent    <dbl> 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, …
$ viewCount          <dbl> 1729562, 993841, 929302, 2125579, 3021542, 895038, …
$ likeCount          <dbl> 83334, 56794, 64304, 123191, 236388, 77980, 73251, …
$ dislikeCount       <lgl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ favoriteCount      <dbl> 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, …
$ commentCount       <dbl> 18716, 9141, 8204, 38823, 111200, 7905, 3831, 5723,…

Exercise 3

Perform different explorative data analysis

Different location parameters

# Load additional packages
library(sjmisc) 

# Get distribution parameters for selected variables
video_list %>% 
  select(durationSec, viewCount, likeCount, favoriteCount, commentCount) %>% 
  descr()


## Basic descriptive statistics

           var    type         label   n NA.prc      mean        sd       se
   durationSec numeric   durationSec 186      0    612.92    402.74    29.53
     viewCount numeric     viewCount 186      0 687563.73 763588.15 55989.00
     likeCount numeric     likeCount 186      0  31172.50  36815.40  2699.44
 favoriteCount numeric favoriteCount 186      0      0.00      0.00     0.00
  commentCount numeric  commentCount 186      0   5598.28  11238.77   824.07
       md   trimmed                   range       iqr skew
    530.5    579.42          1825 (92-1917)    633.25 0.64
 467371.5 562670.19 6671382 (21298-6692680) 782761.75 3.36
  20970.5  24734.81    267376 (1063-268439)  37023.75 3.10
      0.0      0.00                 0 (0-0)      0.00  NaN
   2068.0   3106.24      111166 (34-111200)   5718.00 5.54

More detailed distribution for each variable

# Load additional packages
library(sjPlot) 

# Show distributions for selected variables
video_list %>% 
  plot_frq(durationSec, viewCount, likeCount, commentCount, type = "density")

[[1]]


[[2]]


[[3]]


[[4]]

In-depth analysis

Based on the findings of the previous section, let us take a closer look. Interestingly, although most of the varialbes have a left-sloping distribution, there are isolated outliers on the “right” edge.

Therefore, the next goal is to find out which video(s) they are.