Session 8

🔨 Text as data II: Advanced Methods

Participate

Atteveldt, W. van, Trilling, D., & Arcíla, C. (2021). Computational analysis of communication: A practical introduction to the analysis of texts, networks, and images with code examples in python and r. John Wiley & Sons.
Grimmer, J., Roberts, M. E., & Stewart, B. M. (2022). Text as data: A new framework for machine learning and the social sciences. Princeton University Press.
Jurafsky, D., & Martin, J. H. (2024). Speech and language processing: An introduction to natural language processing, computational linguistics, and speech recognition with language models (3rd ed.). https://web.stanford.edu/~jurafsky/slp3/
Silge, J., & Robinson, D. (2017). Text mining with r: A tidy approach (First edition). O’Reilly.

Nicholls, T. (2019). Detecting Textual Reuse in News Stories, At Scale. International Journal of Communication, 13(0), 25. https://ijoc.org/index.php/ijoc/article/view/9904
Arendt, F., & Karadas, N. (2017). Content analysis of mediated associations: An automated text-analytic approach. Communication Methods and Measures, 11(2), 105–120. https://doi.org/10.1080/19312458.2016.1276894

Chen, Y., Peng, Z., Kim, S.-H., & Choi, C. W. (2023). What We Can Do and Cannot Do with Topic Modeling: A Systematic Review. Communication Methods and Measures, 17(2), 111–130. https://doi.org/10.1080/19312458.2023.2167965
Hase, V. (2023). Automated Content Analysis (F. Oehmer-Pedrazzi, S. H. Kessler, E. Humprecht, K. Sommer, & L. Castro, Eds.; pp. 23–36). Springer Fachmedien Wiesbaden. https://link.springer.com/10.1007/978-3-658-36179-2_3
Maier, D., Waldherr, A., Miltner, P., Wiedemann, G., Niekler, A., Keinert, A., Pfetsch, B., Heyer, G., Reber, U., Häussler, T., Schmid-Petri, H., & Adam, S. (2018). Applying LDA Topic Modeling in Communication Research: Toward a Valid and Reliable Methodology. Communication Methods and Measures, 12(2-3), 93–118. https://doi.org/10.1080/19312458.2018.1430754

quanteda 🌐 |
Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). Quanteda: An r package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. https://doi.org/10.21105/joss.00774
textreuse 🌐 |
Li, Y., & Mullen, L. (2024). Textreuse: Detect text reuse and document similarity. https://docs.ropensci.org/textreuse (website) https://github.com/ropensci/textreuse
tidytext 🌐 |
Silge, J., & Robinson, D. (2016). Tidytext: Text mining and analysis using tidy data principles in r. The Journal of Open Source Software, 1(3), 37. https://doi.org/10.21105/joss.00037