The RStudio integrated development environment
Overview
Teaching: 35 min
Exercises: 10 minQuestions
What is an Integrated Development Environment (IDE) and how can it make data analysis easier?
Objectives
Demonstrate the main features of RStudio for reproducible data anlysis.
Motivate you to learn more about the features of RStudio
Introduction
There are three technologies essential to efficient,reproducible workflows:
- Integrated development environments (IDEs),
- literate programming, and
- version control.
This episode will describe the first of these with RStudio
as its example.
The goal of an IDE is to make coding and data analysis easier. IDEs are often tailored to specific languages (or sets of languages) but there is a set of features which are common to all. These features are:
- A project management system
- Language interpreter (and/or compiler)
- Debugger
- Version control
- Lots of other little things e.g. file manager, interactive language console, bash terminal, syntax highlighting, autocomplete…
Some useful IDEs
- For
R
there is RStudio and RStudio cloud- For
Python
there is PyCharm- For general purpose there is Visual Studio Code or Sublime Text.
- For data science applications Jupyter Lab project is powerful.
This episode will familiarise you with all the main features of RStudio cloud
. These map to the features of the desktop application RStudio local
which you may be familiar with, with a few minor differences. We will be demonstrating the following features:
- organising analysis into
Projects
; - the panes for special features, such as keeping track of variables;
- tools for writing and running R code;
- finding and squashing bugs with the debugger.
The big feature not covered is version control, this will be covered in episode 4.
To help you get to grips with the main features, a useful RStudio
cheat-sheet can be found here, and to learn about the many RStudio cloud
features you should go to the online documentation here.
Analysis is organised in Projects
Projects
keep track of settings tied to a particular data analysis project. When starting a new project you will create a new RStudio
Project
. If you want to continue working on something, just open up the relevant RStudio
Project
.
In order open an existing project:
RStudio cloud
: your projects will be listed underYour Workspace
RStudio local
: click on the[project_name].Rproj
file in a file browser (your system file browser or theRStudio
file browser); OR through RStudio menus:File
>Open Project...
/Recent Projects
.
Setting up RStudio projects
You should already have a project called roar
but just in case you haven’t, complete either one of the exercises below, depending on whether you’re using RStudio local/cloud
.
Create a new
RStudio cloud
project
- Navigate to rstudio.cloud
- Navigate to the
Your Workspace
tab.- Click
New Project
- Rename
Untitled Project
toroar
.
Create a new
RStudio local
project
- Open RStudio
- Select
File
>New Project
>New Directory
>New Project
Directory name
: “roar”Create project as...
: Select a convenient directory- Select
Create Project
You’ll also need to install the required packages so please run the following:
list.of.packages <- c("tidyverse", "data.table", "knitr", "markdown", "rmarkdown")
new.packages <- list.of.packages[!(list.of.packages %in% installed.packages()[,"Package"])]
if(length(new.packages)) install.packages(new.packages)
Course directory structure
You should also create the following directory structure which will help organise the material of this course:
roar
/episode_2_rstudio
/episode_3_rmd
/episode_4_vcs
Please put material from each episode into the relevant folder.
RStudio layout can be customized
RStudio is organised into four panes each with a series of tabs. Their layout can be customized in the global options menu. To introduce this menu let’s make sure we’ve all configured our session the same way.
Rstudio set up
- Open
Tools
>Global Options
- Select the
Pane Layout
tab.- Configure your RStudio with the following layout:
- Bottom left:
Console
andTerminal
- Top left:
Source
- Top right:
Environment
,History
,VCS
- Bottom right:
Files
,Plots
,Packages
,Help
,Viewer
- Select the
Terminal
tab.- Under
New terminals open with:
selectBash
(orGit Bash
)- Select
OK
There are lots of options in the Global Options
and Project Options
menus which allow you to customize your environment. Please spend one minutes browsing the headings to see what customizations are available.
Global vs Project properties.
Pane layout is a ‘Global property’, that is, it will persist across different RStudio sessions.
There are project specific properties in theTools
>Project options...
menu. Please
familiarise yourself with both sets of options.
Project files make analysis easier
One of the main benefits of Projects
is that after previously working on a project, you can open it up again, and start where you left off. This is in terms of the project settings, the variables you created, the packages you imported, and your history of commands you ran.
Let’s investigate this with an exercise:
Starting where you left off
- Select
File
>New File
>New R Script
- Write out the following in the file (or just copy and paste!):
x <- rnorm(10) sum <- 0 for(i in 1:length(x)){ sum <- sum + x[[i]] } print(sum) # save.image() and savehistory() are only needed for this lesson in RStudio cloud for # demonstration purposes - don't put them in your scripts! save.image() savehistory()
- Save the file as
start_again.R
Source
the file so that it runs.- Observe the creation of the
sum
andx
variables in theEnivronment
pane (top right)RStudio cloud
: Log out and reopen RStudio.RStudio local
: Close and reopen RStudio (making sure tosave
your data on exit). Make sure you read the message!- If RStudio doesn’t open up to the same project:
RStudio cloud
: find it in theWorkspace
tab.RStduio local
: Under theFiles
pane (bottom right) find and click on theroar.Rproj
file.- Type
x
orsum
in the console - these should be recognised by theR
interpreter and have the same value as before.
Now answer these questions:
History and data
You should now have a number of new files in your project. What is the
.RData
file? Hint: clear objects from yourEnvironment
(the broom icon next toImport Dataset
), then click on.RData
in theFiles
pane.Solution
The
.RData
contains all your data! i.e. variables, functions, dataframes etc. in your environment.What does
.RHistory
file do? Hint: Just click on it.Solution
The
.RHistory
is a text file containing all your previous commands.
This contents of the .Rhistory
file can be found in the History
tab, and the contents of the .RData
file can be found in the Environment
tab. In RStudio cloud
the history and data files are saved in the background, not as explicit .RData
and .RHistory
files.
When things go wrong
Sometimes your project session will become corrupted, or start doing unexpected things. The first thing to try when debugging is to remove your
.RData
file from the project and and clear your environment. Then try doing whatever your trying to do again. Very often the cause lies within a dodgy object in the saved environment.
The [project-name].RProj
file is a text file containing your project settings. You shouldn’t need to this. Clicking on this file will open a Project
if it’s not already open, or open the Project Options
tab if you’ve already opened that project.
Specilized panes have useful features
Each pane of RStudio
has a special feature which helps you write code and manage your project. Let’s go through them now:
Console
- RunsR
commands.Files
- File navigator.Terminal
- is the same asConsole
but for theBASH/Git BASH
shell-language (or whatever shell language is available to you, e.g.,Powershell
on Windows). This is useful for managing files and for doing complicated things with version control (not covered in this course). If you haveBASH/Git-BASH
try typingpwd
to see what it does.Packages
- shows a list of all the packages you have installed.Help
- is the help files for all installed packages.Git
- we’ll go through later (shout if you don’t have this!)Environment
- as already mentioned, contains all your saved variables/functions/dataframes etc.History
- a list of previously run commands.
Let’s have some more practice with Environment
and History
with two short exercises:
Environment
The environment pane contains all the objects in your session, e.g. functions, variables, dataframes etc. There is more than one environment however, the default one shown is the
Global Environemt
. Select thepackage: datasets
environment. What does it show? What happens if you click on an object there?Solution
It shows all the objects in the base R package
datasets
. If you click on an object it opens the dataset.Try loading an installed package e.g. by running
library(data.table)
in the console. Now open that environment and click on theValue
,melt
. What happens?Solution
The code is displayed in a new window pane.
So the Environment
shows all the environments that are available to you. This includes all the loaded libraries.
History
The history pane shows you all the previous commands that have been run. Run the following command in the console:
y <- rbeta(10, 1, 2)
Now use the History pane to transfer this (with one click) to the last line of
start_again.R
.Solution
Highlight the relevant line and then click
To Source
. Make sure your curser is on a new line.
This feature of History
is particularly useful - you can use it to iteratively build up your R
script by: first trying things out in the console and second, easily transfer them to your script when you’re happy with what they do. The History
pane is searchable as well so you can add commands from previous sessions.
RStudio has tools for easy code writing and running
Other great features are:
- tab complete,
- multiline editing,
- keyboard shortcuts.
Note: the keyboard shortcuts won’t work in RStudio Cloud
as they clash with your browser shortcuts.
Tab complete
To explore tab complete, try the following:
- Type
z <- r
in thestart_again.R
file on a new line.- Now hit the tab key (Tab) and you should see a list come up full of functions, datasets and packages all starting with the letter r.
- Now type u and you should see the menu change.
- Use the curser or your arrow keys to highlight
runif
and hit Tab again.- You should have
z <- runif()
automatically typed out.- With the curser inbetween the parenthses, hit Tab again.
- A list of the required function arguments are shown (next to purple slanted rectangles) and other variables (yellow slanted rectangles).
- Make sure the
n =
is highlighted and hit enter Enter, type10
, your line should now look likez <- runit(n=10)
.- Fill in the remaining function arguments with
min=1
andmax=2
using tab complete (you’ll need to put the , separating the aguments in yourself).
Multiline editing
This is very useful for repetitive code or text.
- Create 5 blank lines in
start_again.R
.- Click your curser to the top of those lines.
- Hold down Alt and drag your curser down 5 lines.
- Type
# I love R
. (You’ll need the # so it is a comment). Click Esc to stop multiline editing.- You should see 5 lots of ‘I love R’.
Keyboard shortcuts
See if you can the following without using the mouse. Use the
RStudio
cheat sheet (here):
- Create a new
R
file.- Save it as
semicircle.R
- Use tab complete to help you write out the following:
x <- seq.int(from = -1, to = 1, length.out = 100) y <- sqrt(1-x**2) plot(x, y)
- Save file.
- Run it using
Source with echo
.- Switch to the console and save the image by typing
save.image(file='episode_2_rstudio/semicircle.png')
Solution
- Ctrl+Shift+N
- Ctrl+s
- Use tab complete as per the previous challenge.
- Ctrl+s
- Ctrl+Shift+Enter
- Ctrl+2
The debugger helps you find errors
Debugging is essential when you anything more than a few simple lines of code. You can find a summary of RStudio debugging tools here. This link also contains links to more advanced tutorials.
When you’ve got to this point, please let your instructor
know (Zoom
green tick or green Post-It note) as we’ll go through the debugger together.
You’ll need to have the following code in a new R script:
```{r}
# Create erroneous data
estimates <- runif(100)
estimates[[3]] <- -999
df <- data.frame(x=estimates)
# do analysis
sd <- sd(df$x)
mean <- mean(df$x)
z_score <- function(est, mean, sd){
diff <- est-mean
z <- diff/sd
return(z)
}
z_scores <- vector(length=dim(df)[1])
for (i in 1:dim(df)[1]){
# browser()
est <- df$x[[i]]
z <- z_score(est, mean, sd)
z_scores[[i]] <- z
}
hist(z_scores)
```
Further reading
Mastering all the features of an IDE is time consuming but well worth it. Check out RStudio tutorials by clicking here
Key Points
IDEs such as RStudio make coding and data analysis easier and more reproducible.