Session 8
🔨 Text as data II: Advanced Methods
Participate
🖥️ Session 08
Suggested readings
- Atteveldt, W. van, Trilling, D., & Arcíla, C. (2021). Computational analysis of communication: A practical introduction to the analysis of texts, networks, and images with code examples in python and r. John Wiley & Sons. 
- Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press. 
- Jurafsky, D., & Martin, J. H. (2024). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition with language models (3rd ed.). https://web.stanford.edu/~jurafsky/slp3/ 
- Silge, J., & Robinson, D. (2017). Text mining with r: A tidy approach (First edition). O’Reilly. 
… with focus on n-gram methods
- Nicholls, T. (2019). Detecting Textual Reuse in News Stories, At Scale. International Journal of Communication, 13(0), 25. https://ijoc.org/index.php/ijoc/article/view/9904 
- Arendt, F., & Karadas, N. (2017). Content analysis of mediated associations: An automated text-analytic approach. Communication Methods and Measures, 11(2), 105–120. https://doi.org/10.1080/19312458.2016.1276894 
… with focus on topic modeling
- Chen, Y., Peng, Z., Kim, S.-H., & Choi, C. W. (2023). What We Can Do and Cannot Do with Topic Modeling: A Systematic Review. Communication Methods and Measures, 17(2), 111–130. https://doi.org/10.1080/19312458.2023.2167965 
- Hase, V. (2023). Automated Content Analysis (F. Oehmer-Pedrazzi, S. H. Kessler, E. Humprecht, K. Sommer, & L. Castro, Eds.; pp. 23–36). Springer Fachmedien Wiesbaden. https://link.springer.com/10.1007/978-3-658-36179-2_3 
- Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Häussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures, 12(2-3), 93–118. https://doi.org/10.1080/19312458.2018.1430754 
Useful packages
- quanteda🌐 |
 Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). Quanteda: An r package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774
- textreuse🌐 |
 Li, Y., & Mullen, L. (2024). Textreuse: Detect text reuse and document similarity. https://docs.ropensci.org/textreuse (website) https://github.com/ropensci/textreuse
- tidytext🌐 |
 Silge, J., & Robinson, D. (2016). Tidytext: Text mining and analysis using tidy data principles in r. The Journal of Open Source Software, 1(3), 37. https://doi.org/10.21105/joss.00037
Back to course schedule ⏎
