Welcome!
MILS 2026: Intro to R
This is the course website for Module 1: Introduction to R, a 15 hour introductory course delivered at the summer school Methods in Language Sciences at Ghent University.
Although there is no shortage of excellent introductory courses on R these days, this particular course specifically caters to students of linguistics with little to no background in R (or programming for that matter) and who wish to learn R in the context of statistical data analysis.
This beginner-friendly course aims to provide participants with a solid foundation in R, empowering them to explore, analyze, and visualize data efficiently. Emphasis is placed on hands-on practical exercises and real-world examples, enabling students to immediately apply their knowledge.
The following key topics are covered:
- Understanding the difference between R, Rstudio and R notebook
- Exploring basic arithmetic and logical operations.
- Introduction to essential data structures, including vectors, matrices, and data frames
- Importing data files
- Manipulating, indexing and pivoting data structures
- Generating descriptive statistics and visualisations in base-R
- Introduction to the tidyverse approach (incl. ggplot2 visualisations)
Getting started
Before coming to class make sure to:
- Install R
- Install RStudio
Go to the Quarto website and follow the instructions: https://quarto.org/docs/get-started/
Course materials
All course materials for this module are provided on this website and includes slides, R-scripts and Notebooks, exercises, datasets and further reading materials.
Instructors
Gil Verbeke is a PhD researcher at the Linguistics Department of Ghent University. His research interests include second language acquisition, speech perception, phonetics and phonology. The overarching aim of his PhD dissertation is to examine how pronunciation variation in spoken English affects speech perception and word recognition for late second language learners of English. Several of his experimental studies have been published in international journals, including Bilingualism: Language and Cognition and Lingua.
Romeo De Timmerman is a PhD researcher at the Linguistics Department of Ghent University. He is also completing a Master’s degree in computational statistics at the same university. His research focuses on language variation through both sociolinguistic and computational approaches. His current projects investigate (i) the prevalence of African American English features in blues music and (ii) potential linguistic biases against African American English in Large Language Models.
Approach
Nothing beats hands-on instruction from an experienced instructor. You are expected to write code and to make mistakes. Forgetting a comma or a bracket is a very rewarding experience, especially for beginners. Don’t remore friction by using genAI or LLMs from the very beginning. Yes, LLMs are incredibly good and will assist you at a later stage, but resits the temptation to use these tools as a beginner. Actually, the first three hours or so of this course are spent in R and R only, i.e., sans RStudio – which is less scary than one might think. This course also starts with base R, rather than jumping straight into the deep end with tidyverse. This is very much inspired by Norm Matloff’s course fasteR.