This workshop covers how to perform common text analysis and natural language processing tasks using R, relying heavily on the well-rounded quanteda package (https://github.com/kbenoit/quanteda). When used properly, R is a fast and powerful tool for managing even very large text analysis tasks. We will go over formatting and inputting source texts, structuring metadata, and prepare text for analysis. We'll show how to: get summary statistics from text, search for and analyse keywords and phrases, analyse text for lexical diversity and readability, apply dictionaries, and more. Structured objects from quanteda can be readily passed into other text analytic packages for additional analyses like topic modelling, regression models, and other forms of machine learning - though this workshop will not cover these advanced techniques.
While it will be valuable to have some prior experience in R, expertise in R is not required, and even those with no previous knowledge of R are welcome.
Instructor: Michele Claibourn