This workshop will provide an introduction into R!
R is a popular programming language that many researchers use for organizing data, visualizing data, and carrying out statistical analyses.
By the end of this workshop series, my hope is that you will feel comfortable enough to work independently in R!
[What people think coding is versus what it actually is]
Before the workshop, we’ll need to download R and RStudio. Throughout the workshop, we’ll be working in RStudio, which will allow us to write code in R. So let’s make sure we have both R and RStudio installed before we begin!
Download a R CRAN Mirror, which basically just hosts the R programming language that we will be using in RStudio. https://cran.r-project.org/
Download RStudio, which is the main software that we will be using to work with R. https://posit.co/download/rstudio-desktop/
Download the CABLAB-R-Workshop-Series folder from the CABLAB R Workshop Series Github page (https://github.com/steventmartinez/CABLAB-R-Workshop-Series) by pressing the green Code button and downloading the ZIP folder. This is the folder containing the all the files we will be working with for the purposes of this workshop.
Open up a new R Markdown document by clicking File > New File > R Markdown. First time R users will be asked to download packages once they open up an R Markdown file. Click “Yes” to downloading those packages!
To get things started, open R Studio. Then, let’s try opening a new R Markdown document, by clicking File > New File > R Markdown…
First time R users will be asked to download packages once they open up an R Markdown file. Click “Yes” to downloading those packages!
This should produce a dialogue box where you can enter the name of the script and your name before selecting OK.
Next, let’s clear out all of the default text that appears in a new R Markdown document, which I have highlighted below:
In a typical coding script, every line must contain code that the language could interpret. If you want to include notes, you have to include a hash mark (#) before any code in order for the program to “ignore this line”. So, in order to leave ourselves any notes, we had to use hash marks, which can get a bit annoying. However, an R Markdown script does the same things as a typical coding script, but it’s more user friendly.
With R Markdown, any code that you would like R to interpret belongs in the coding chunk as illustrated below!
If we want to leave notes, we don’t have to “comment it out”. We can just write long-winded narration that can help others understand why we coded what we coded and what that code does.
That’s because a typical script will interpret any text as a command, unless the text is otherwise marked by a hashtag (#). An R markdown script only interprets things as code when we tell it to, and we tell it what is code by creating a chunk. Chunks are marked by three backticks (```) followed by a {r} and, on another line, three more backticks.
A typical script can’t make sense of this, though. We need to use R markdown scripts to do it. You might be thinking, though, that manually denoting code from non-code seems like extra work, and it is a little bit, but it can also be a lot more convenient because the output of any given chunk will appear in the R Studio Console Window. By output, we just mean the product, sum, or status of whatever calculation or item you are asking R to compute and show you.
R Markdown grants us greater control over what we see and when we see it. To demonstrate, let’s start by creating a new chunk in our markdown document and entering what we see in the image above, you can then follow along with the next bit:
2 + 2
## [1] 4
With a typical script, if we want to know the output of a line we ran awhile ago, we either have to rerun it or scroll through the console to find it. With Markdown we can minimize entire chunks and their output by using the minimization button [] on the left side of the window.
If we want to hide output, we can use the expand/collapse button [] on the right side of the output window.
We can choose exactly what we want to run using the the “Run” command [] in the upper right corner of the chunk.
Also of note, the down-facing arrow (second icon in the upper right corner of the code block) will tell R “Run all of the blocks of command that I have before this block” []. It can be helpful if you make a mistake and don’t want to manually rerun all of the previous blocks one by one to get back to where you were. It also makes your code very easy for other people to run. They can quite literally do it with the click of a button!
If we click the cog icon in the same tray, we can access the output options and manipulate where output appears and what it looks like, but that’s beyond the scope of this review [].
Packages in R are synonymous with libraries in other languages. They are more or less convenient short-cuts or functions someone else already programmed to save us some work. Somebody else already figured out a very quick way to compute a function so now we don’t have to! We just use their tools to do it.
Every new package is centralized in R’s repository, so even though thousands of people are working on these things independently, you don’t need to leave R to find them. Before they can be used, they must be installed, and you can do that pretty simply:
install.packages("PACKAGENAME")
If you need to update a package, you can just re-run the above code. If you’re using R Studio, you can also see a list of your packages and their associated descriptions in the ‘Packages’ Tab of your Viewer Window.
Now we’ve installed a package, that doesn’t mean we can use it yet. We need to tell R “We want access to the functions this package has during this session” by calling it with the library() command.
library(PACKAGENAME)
Notice that we drop the quotation marks now. We just specify the (case-sensitive) package name and it lets R know we are planning on using that this session.
You might be wondering why we need to take this extra step. Sometimes different packages use the same commands, so having more than one of those active at the same time could confuse R (When this does happen, R will usually tell you). Sometimes packages take up a lot of disk space, so having ALL of your packages initialized at once might leave your computer running extremely slow. It’s the same for most languages.
If we ever want to explore the functions contained within a package in conjunction with examples, we can either go to the R documentation website or type ‘??PackageName’ into the Console, which will then populate the Help Tab of the Viewer Window with information on the package.
Let’s try installing and loading in a few package for practice. Let’s install and load the following packages in R: naniar, report, tidyverse, dplyr, Matrix, lme4, lmerTest, and ggplot2
Swirl is a really cool package in R that teaches you R programming and data science interactively, at your own pace, and right in the R console! For our first assignment, I think swirl explains some fundamental concepts in a better way than I can, so let’s tackle the “R Programming: The basics of programming in R” course and complete Module 1: Basic Building Blocks in swirl.
Some of it will make sense, and some of it won’t (and that’s okay!), but I think swirl does a pretty good job of orienting people to how basic operations in R work, and I think this is especially helpful before we start working with any actual data.
Let’s give this a try and we can talk through any problems people ran into during our next workshop. I’ve attached some screenshots below demonstrating how to install and load swirl().
Hopefully swirl() has helped you feel a bit more comfortable in navigating R. Today we will focus on working with directories in R.
A working directory is a fancy term that refers to the default location where R will look for files you want to load and where it will put any files you save. Like any other language or program, R needs to be told where the data that we’d like to work with is located on our computer. It doesn’t just know automatically.
Below we’ll use the getwd() command to check out where where your current working directory is.
Using the list.files() command will show you what files exist in your current working directory.
getwd() #get your current working directory
## [1] "/Users/tuh20985/Desktop/CABLAB-R-Workshop-Series-main"
list.files() #Use list.files() to check the contents of your working directory
## [1] "CABLAB_R_online.Rmd" "datasets" "exercise_solutions"
## [4] "images" "index.html" "misc"
## [7] "R memes" "README.md"
In order to work with the data that we want to work with, we’ll have to tell R where the files are located, so we can create a new variable containing a filepath to make this process simple so we aren’t writing it out multiple times. Filepaths will differ based on whether you are using a Windows versus a Mac. If you’re using a Windows computer, it’s likely your file path will exist within your “C:/ Drive”. If you’re on a Mac, it’s likely your file path will start with a forward slash “/”. If you’re not sure of your path, R makes it relatively easy to find it.
You can press tab when your cursor is to the left of the slash to see a list of directories contained within your computer.
# For Windows
Path <- "C:/"
# For Mac
Path <- "/"
Here’s an example of what you should see:
Pressing tab again will enter into a directory, thus showing me the contents of that directory. From there, I can keep hitting tab until I get to the directory, or folder, that contains the files I want to work with. I can then save this filepath, which is just what we call a string (i.e., text that does not contain a quantitative value), as an object named Path. We do so by placing the object on the left of an equal sign (=) or an arrow (<-) and the value that object is taking on the right side of it.
Below, let’s assign the filepath where our CABLAB R Workshop Series folder exists to an object called “Path”.
# For Windows
Path <- "C:/Users/tuh20985/Desktop/CABLAB-R-Workshop-Series-main/datasets/"
# For Mac
Path <- "/Users/tuh20985/Desktop/CABLAB-R-Workshop-Series-main/datasets/"
This format of assigning a value to an object is really important and we’ll keep coming back to it throughout this tutorial!
For the purposes of this project, we are going to work with the Fright Night dataset! The Fright Night project took place in 2021 at the Eastern State Penitentiary’s annual “Halloween Nights” haunted house event in Philadelphia. 116 participants completed a haunted house tour as part of a research study assessing the relationship between threat and memory.
Specifically, we explored 2 main research questions: 1) How does naturalistic threat affect memory accuracy? 2) Does naturalistic threat affect the way in which we communicate our memories?
Participants toured four haunted house segments (Delirium, Take 13, Machine Shop, and Crypt) that included low-threat and high-threat segments. Delirium and Take 13 were low-threat segments, whereas Machine Shop and Crypt were high-threat segments.
To assess memory accuracy, we focused on temporal memory accuracy specifically. Temporal memory refers to memory for the order in which events occur. To measure temporal memory within our study, we focused on accuracy on the recency discrimination task that participants completed for each haunted house segment. As part of the recency discrimination task, participants were shown pairs of trial-unique events within each haunted house segment and asked to select which event came first. In this way, we can determine the accuracy of people’s temporal memory for the order of the events they experienced.
To assess communication styles during memory recall, we focused on the free recall memory task where we asked participants to freely recall their memory for each haunted house segments. We fed the free recall memory transcripts into a natural language processing instrument called the Linguistic Inquiry and Word Count (LIWC) software. LIWC calculates the percentage of words in a given text that belong to linguistic categories that have been shown to index psychosocial constructs. In the example attached below, you can see the percentage of words that contribute to a linguistic category called “Authenticity” which is thought to reflect perceived honesty and genuineness, and the percentage of words that belong to a linguistic category called “Analytical Thinking”, which is thought to reflect formal or logical thinking.
There were also 3 experimental conditions: Control, Share, and Test.
Control condition: Participants were instructed to tour the haunted house segment as they normally would.
Share condition: Participants were instructed to tour the haunted house segment in anticipation of an opportunity to post about their experience on social media afterwards.
Test condition: Participants were instructed to tour the haunted house segment in anticipation of being tested on their knowledge of the haunted house segment afterwards.
For the first two segments (Delirium and Take 13), all participants toured the segment in the Control condition. However, in the last two segments (Crypt and Machine Shop), some participants toured the segments in the Control condition, other participants toured Machine Shop in the Share condition and Crypt in the Test condition, while other participants toured Machine Shop in the Test condition and Crypt in the Share condition.
After completing the haunted house tour, participants were assessed at two time points: immediately afterwards and again 1-week later. During the Immediate assessments, participants completed a recency discrimination task and freely recalled their memory for 1 low-threat and 1-high threat haunted house segment. During the one week-delay assessments, participants completed a recency discrimination task and freely recalled their memory for all haunted house segments. Check out the study design below as well as the vignette illustrating when the three experimental conditions (i.e., Control, Share, and Test) took place throughout the haunted house tour.