LIS 4370 R Programming: R Markdown

One of my favorite things about using R and RStudio, is how versatile the tools are. Not everything is about writing scripts to do lots of work. Sometimes you want to write documentation, reports, or whatever else you may want to write. Sometimes those documents are very closely related to the code and you want an easy way to do your writing. That is what R Markdown is for. I have used R Markdown fairly extensively before for other classes, this is the first time I am really getting into the details of what it is.

A good example of an R Markdown document I have created is the Introduction to protein8k vignette I wrote as a new user manual for the protein8k package. You can find the original source of the document on my GitHub here, or follow the link at the bottom of this post.

The first thing in a R Markdown file that defines the document, is the header. In the vignette I wrote, the header looks like this:

---
title: "Introduction to protein8k"
author: "Simon Liles"
date: "`2021-04-01`"
output: rmarkdown::html_vignette
description: > 
  Start here if this is your first time using protein8k. You will learn how to
  get started with this package and what the results of visualizations mean. 
vignette: >
  %\VignetteIndexEntry{Introduction to protein8k}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---
Code language: Markdown (markdown)

There is some standard metadata that one would expect to see in a document, title, author, data, and description. There is also the output, this tells the interpreter how to knit together the R markdown file. If it was a simple report, I would use html instead of rmarkdown::html_vignette. There is also other code with the vignette: > tag, that is just more instructions for the interpreter on how knit the document together.

One of the major differences between a regular markdown file and an R Markdown file, is the use of code blocks. In an R Markdown file, R code blocks will be automatically run and have their outputs directly below the code block. This allows for easy discussion in a regular document on how the code operates and what the results of an algorithm look like. It also saves you the trouble of saving separate images for plots and then inserting them into the document. They will just appear automatically, so long as your code block options are set for that. For example in the vignette, in its discussion of the function summary() there is a code block followed by output. In the source code it looks like this:

```{r, message=FALSE}
summary(p53_tetramerization)
```
Code language: R (r)

And then in the final HTML document, the output is like so:

summary(p53_tetramerization)Code language: R (r)
## S4 Object of class Protein
## ID Code: 1AIE     Deposition Date: 1997-04-17 
## Classification: P53 TETRAMERIZATION                      
## Title: P53 TETRAMERIZATION DOMAIN CRYSTAL STRUCTURE 
## Atomic Record Contains 590 rowsCode language: plaintext (plaintext)

Formatting for things such as headers is also easier as it is done inline with the text. For example the markdown code:

## Inspecting Protein data. 
Code language: Markdown (markdown)

Becomes:

Inspecting Protein data.

Another fun thing with R Markdown files is when they are knitted to HTML, you can apply a css class to them, and then customize how the final output appears. This will also carry into vignettes for R packages, so you can have your own custom styles in the R Help Page navigator.

R Markdown files are good way to organize and document large and complex analyses in R. For code that flows in a logical order once, it is helpful to have large blocks of just text alongside as the results of the analysis come through and subsequent analysis is performed. To me, in that use case R Markdown makes sense.

Links:

Vignette Source Code, GitHub: github.com/SimonLiles/protein8k/blob/master/vignettes/intro_protein8k.Rmd