LIS 4370 R Programming: Introducing protein8k

Over the past month or so I have been working with Dr. Alon Friedman to develop a package that can handle protein data. The result of these efforts is the protein8k package which can be found on my GitHub here. In this post I will be walking through and briefly demonstrating some of the major features of this package.

To begin using the package you need to install and load it into your environment. At this point in time, it is only possible to install directly from GitHub. The process looks like this.

#Install the package
devtools::install_github("SimonLiles/protein8k", build_vignettes = TRUE)

#Load protein8k package
library(protein8k)
Code language: R (r)

This will install the package with the vignettes as well in the documentation section. To browse the vignettes, simply use the following command:

browseVignettes(package = "protein8k")Code language: R (r)

Now that you have the package loaded in the environment, we can go through some of the major features. First is the ability to read in Protein Data Bank, or PDB, files. These are files that describe the structure of a protein as well as the experiment used to map that structure. In the protein8k package, it is a simple task to turn the PDB formatted file into an R object that can be manipulated.

The read.pdb() function takes the file path from the working directory to the PDB as an argument, and will return a Protein object.

#Reading a PDB file
fileName <- "1aieH"

my_Protein <- read.pdb(fileName)
Code language: R (r)

Now that you have a protein object, you want to inspect it so that you know what you are working with. Simply use summary() like you would for any other R object and it will output to the console a summary of the Protein object data. For example, lets inspect the protein that was read in above.

#Inspecting a Protein object
summary(my_Protein)
Code language: R (r)

This will give the following output.

S4 Object of class Protein
ID Code: 1AIE     Deposition Date: 1997-04-17 
Classification: P53 TETRAMERIZATION                      
Title: P53 TETRAMERIZATION DOMAIN CRYSTAL STRUCTURE 
Atomic Record Contains 590 rowsCode language: plaintext (plaintext)

The above output is a good example of typical summary() output from a Protein object.

In the general work flow with this package, the next step is to make a 3D plot of the protein’s atomic record. The atomic record contains the location, and other identifying information for every atom in the protein. Using plot3D() one can create a 3D plot of this information. For example with the protein data I have been using before, you can use the following code to create a visualization.

#Plotting the Protein Structure
plot3D(my_Protein, groups = residue_name, screen = list(z = -30, x = -60))
Code language: R (r)
Figure 1: An example of plotting a Protein Object with protein8k.

It is also possible to model the structure of the protein, and create appropriate visualizations of this data. This is done with plotModels().

#Modeling the Atomic Record
plotModels(my_Protein)
Code language: R (r)

This will produce the following visualization.

Figure 2: Models of a protein’s structure made using protein8k.

Plots such as these can be informative of the overall structure of a protein.

The protein8k package is meant to help with the visualization of biostatistics and data such as PDBs. In the future support for more types of data and more functionality will be added so as to enable researchers and data scientists to develop the highest quality analyses possible with ease and tell better stories with their data.

I hope you will enjoy trying out my package. This post is only to serve as a brief introduction to the package. The package is currently still in early alpha, but I do plan on doing a full walkthrough of the package once it gets to Version 1.0 and I can submit it to the CRAN repositories.

Links:

protein8k on GitHub: https://github.com/SimonLiles/protein8k

Walkthrough Script on GitHub: https://github.com/SimonLiles/LIS4370RProgramming/blob/main/FinalProject/walkthrough_protein8k.R

Download the Walkthrough project folder: https://github.com/SimonLiles/LIS4370RProgramming/blob/main/FinalProject/FinalProject.zip