Syllabus
Course information
| First session | Session time | Location (Room) |
|---|---|---|
| 23.10.2023 | Wednesdays, 13:15 - 14:45 | Findelgasse, 2.024 |
- The course can be held in English or German (incl. presentations, pitches, discussions, etc.), depending on the preferences voiced by the students in the survey before the beginning of the course.
- Regardless of the feedback on the course language, the written assignments can be in German or English, depending on what the student/groups prefer.
Course description
In this seminar, students are introduced to working with digital behavioral data (DBD). DBD refer to digital traces of human behavior that are knowingly or unknowingly left in online environments (e.g. social media, messengers, entertainment media, or digital collaboration tools). These rich data is increasingly available to social scientific research in the public interest, but can also be used to derive strategic insights for business decisions.
Students learn how to work with DBD alongside the entire research process, from data collection, preprocessing and analysis, to reporting and provision (e.g. via open science tools). Students first get a comprehensive overview of the ways in which DBD can be collected (e.g., API scraping, usage logging, mock-up virtual environments, or data donations), as well as the requirements for data protection, research ethics, and data quality. Afterwards, students practice and apply their newly acquired knowledge in small projects on use cases from media and communication research. In doing so, they learn important computer-based methods with which large digital behavioral data sets (e.g. texts, images, usage behavior logs) can be processed and analyzed. By completing this module, participants will get an up-to-date overview and practical insights into how the potential of observational data (digital traces) can be used to better understand the behavior of media users in digital environments.
Learning Objectives
Students will
- overview and understand central opportunities of DBD and accompanying challenges for data collection and preprocessing
- evaluate the strengths and weaknesses of different ways of collecting DBD
- get to know and understand central requirements for data protection, research ethics, and data quality
- get to know and overview key computational social science methods to analyze DBD
- practice and apply knowledge on DBD, statistics, and data analysis in small projects of their own
Recommended prerequisites
- Interest in social scientific perspectives on media, communication, and digital technologies.
- Basic knowledge of working with statistical software such as Stata, R, Python, or SPSS is required.
- Students are recommended, but not required, to also visit the lecture Data Science: Foundations, Tools, Applications in Socio-Economics and Marketing.
Organization of the course
Registration for the course takes place via StudOn. There you will receive the first information and instructions. Please make sure that you complete the short survey before the seminar begins.
All slides, assignment instructions, an up-to-date schedule, and other course materials may be found on the course website. I will regularly send out course announcements by e-mail, so please make sure to check your mail address associated with StudOn regularly.
(Preliminary) Schedule
- ⚠️ Please note that this is a provisional timetable which may change, especially after the kick-off meeting and the allocation of topics (see Course schedule for the latest version).
- All sessions marked with a 🔨 are hands-on sessions actively working with R.
- All sessions marked with a 📚 are presentation sessions where groups of students will give a detailed presentation (see Assignments for more information).
- All sessions marked with a 📊 are presentation sessions where groups of students will present the results of their project work (see Assignments for more information).
| Session | Datum | Topic |
|---|---|---|
| 📂 Block 1 | Introduction | |
| 1 | 23.10.2024 | Kick-Off |
| 2 | 30.10.2024 | DBD: Introduction & Overview |
| 3 | 06.11.2024 | 🔨 Introduction to working with R |
| 📂 Block 2 | Theoretical background: & TV election debates | |
| 4 | 13.11.2024 | 📚 usage in focus |
| 5 | 20.11.2024 | 📚 Effects of & TV debates |
| 6 | 27.11.2024 | 📚 Political TV debates & social media |
| 📂 Block 3 | Natural language processing | |
| 7 | 04.12.2024 | 🔨 Text as data I: Introduction |
| 8 | 11.12.2024 | 🔨 Text as data II: Advanced Methods |
| 9 | 18.12.2024 | 🔨 Advanced Method I: Topic Modeling |
| - | - | 🎄Christmas Break (No Lecture) |
| 10 | 08.01.2025 | 🔨 Advanced Method II: Machine Learning |
| 📂 Block 5 | Project Work | |
| 11 | 15.01.2025 | 🔨 Project work |
| 12 | 22.01.2025 | 🔨 Project work |
| 13 | 29.01.2025 | 📊 Project Presentation (I) |
| 14 | 05.02.2025 | 📊 Project Presentation (II) & 🏁 Evaluation |
For the latest, more detailed version of the course schedule as well as the linked content of the individual sessions (e.g. slides or literature for the respective presentation), please see Schedule.
The course consists of several blocks:
Block I: Introduction
The first four sessions form the (theoretical) basis for the course.
- The kick-off session is mainly for getting to know each other and organizing the course.
- The second session is to give you an extended introduction DBD, including challenges and important frameworks.
- The third session is about practical work with R and RStudio.
Block II: Theoretical foundation
The second block will contain the presentations by different groups of students about the research relevant for the course.
Block III: Natural Language Processing (NLP)
The third block is about working with text data. It is separated into four parts.
- The first session is an introduction to working with text data.
- The second session is about advanced methods for working with text data and introduces the two methods of the next two sessions.
- The third session is about topic modeling, a method to extract topics from text data.
- The fourth session is about machine learning, a method to classify text data.
Block IV: Project Work
The last block is about working on your project. The goal is to combine the theoretical and practical knowledge you have gained in the previous sessions and apply it to a research project of your choice. This means you have to find a research question, develop a concept of an analysis, run it, analyze it, and present your results (short presentation and written report).
Sessions
The goal of the sessions is to be as interactive as possible. In general, the sessions consist of two parts. In the first part (± 30 - 45 minutes) at the beginning of the session, there are usually presentations (including discussion), which are more or less detailed depending on the stage of the project. The second part (± 45 - 60 minutes) consists of a group activity (with concluding discussion), which should either be about deepening the presentation content or about independent work on one’s own or the group project.
My role as instructor is to introduce you new tools and techniques, but it is up to you to take them and make use of them. A lot of what you do in this course will involve writing code, and coding is a skill that is best learned by doing. You are expected to bring a laptop to each class so that you can take part in the in-session exercises. Please make sure your laptop is fully charged before you come to class as the number of outlets in the classroom will not be sufficient to accommodate everyone.
Where to ask questions
- If you have a question during the lecture, feel free to ask it! There are likely other students with the same question, so by asking you will create a learning opportunity for everyone.
- Any general questions about session content, assignments or about the project should be posted into the
StudOn-Forum, so that everyone can benefit from the answers. There is a chance another student has already asked a similar question, so please check the other posts before adding a new question. If you know the answer to a question, I encourage you to respond!
- E-mails should be reserved for personal matters.
Assessment
In order to obtain credits and a grade, participants are required to
- attend regularly (at least 80% of the sessions) and participate actively. A maximum of two sessions can be missed without excuse. Absence in further sessions can only be excused in case of illness (i. e. with a medical certificate).
- complete various assignments as part of a portfolio. The type and scope of the assignments depends on the number of participants and the project(s). Detailed information can be found in the section Assignments.
Academic integrity
Do not cheat!
For general information on formatting, style, citation, appendices, wording of the affidavit, etc., see our Guide to Academic Writing.
Policy on sharing and reusing code
I am well aware that a huge volume of code is available on the web to solve any number of problems. Unless I explicitly tell you not to use something, the course’s policy is that you may make use of any online resources (e.g. StackOverflow) but you must explicitly cite where you obtained any code you directly use (or use as inspiration). Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism.
Policy on use of generative artificial intelligence (AI):
You should treat generative AI, such as ChatGPT, the same as other online resources. There are two guiding principles that govern how you can use AI in this course1: (1) Cognitive dimension: Working with AI should not reduce your ability to think clearly. We will practice using AI to facilitate—rather than hinder—learning. (2) Ethical dimension: Students using AI should be transparent about their use and make sure it aligns with academic integrity.
✅ AI tools for code: You may make use of the technology for coding examples on assignments; if you do so, you must explicitly cite where you obtained the code. Any recycled code that is discovered and is not explicitly cited will be treated as plagiarism. You may use these guidelines for citing AI-generated content.
❌ AI tools for narrative: Unless instructed otherwise, you may not use generative AI to write narrative on assignments. In general, you may use generative AI as a resource as you complete assignments but not to answer the exercises for you. You are ultimately responsible for the work you turn in; it should reflect your understanding of the course content.
Recommended textbooks
Atteveldt, W. van, Trilling, D., & Arcíla, C. (2021). Computational analysis of communication: A practical introduction to the analysis of texts, networks, and images with code examples in python and r. John Wiley & Sons.
Engel, U., Quan-Haase, A., Liu, S. X., & Lyberg, L. (2021). Handbook of Computational Social Science, Volume 1: Theory, Case Studies and Ethics (1st ed.). Routledge. https://doi.org/10.4324/9781003024583
Engel, U., Quan-Haase, A., Liu, S. X., & Lyberg, L. (2021). Handbook of computational social science, volume 2. Routledge. https://doi.org/10.4324/9781003025245
Haim, M. (2023). Computational Communication Science: Eine Einführung. Springer Fachmedien Wiesbaden. https://link.springer.com/10.1007/978-3-658-40171-9
Salganik, M. J. (2018). Bit by bit: Social research in the digital age. Princeton University Press.
Footnotes
These guiding principles are based on Course Policies related to ChatGPT and other AI Tools developed by Joel Gladd, Ph.D↩︎