LIS 4370 R Programming: Building my Own R Package

Recently I have started building my own R Package as a part of a project with my professor, Dr. Friedman. The project involves visual analytics of proteins and requires we build our own R package. The project is still in the very early stages, so the package is also still in early Alpha.

R packages have a very standardized structure, consisting of specific files and folders that allow the package to run. One of these files that defines the package is the Description file. Below you can find the contents of the description file for my package. You can also find a copy of it on my GitHub here, or follow the link at the bottom of this post.

Package: protein8k
Type: Package
Title: Perfom Analysis and Create Visualizations of Proteins
Version: 0.0.0.9000
Author: Simon Liles
Maintainer: Simon Liles <simon@sveoti.net>
Description: This Package can be used for the reading of Protein Data Bank (PDB)
    files, perform analysis, and visualize and model the proteins in 3D. It can 
    also create small .gif files of a protein for presentations. 
Depends: R (>= 3.1.2)
Imports: pryr, lattice, methods
License: CC0
Encoding: UTF-8
LazyData: true
RoxygenNote: 7.1.1
Code language: plaintext (plaintext)

This is all there is to a description file. It is meant to be very short, however there is a lot of what I would call house keeping done in this file that makes the package run smoothly in the R environment. So lets break down the contents line by line.

The first three lines give some information to identify the package. In this case, this is the protein8k Package and the title for the package is “Perfom Analysis and Create Visualizations of Proteins.” The title seems like it should be the same as the name of the package, however this is not always the best way and sometimes you need a descriptive title so that the user will understand what the package does. When you go through the help files for this package, the title will be shown at the top of the index page.

Then on line 4 is the version number. This is useful for keeping track of updates. RStudio will keep track of updates and will update packages to their latest version when you tell it to do so. The current version of my package is 0.0.0.9000 which is a common convention for indicating a package is in alpha. The first 3 zeros indicate major, minor, and patch updates respectively.

The next two lines indicate the author of the code and who maintains it. Sometimes these are different, for example the original author could have moved to another project and someone else may have came in to maintain the code, keep it up to date with the dependencies.

On line 7 starts the description blurb. This is typically about a paragraph describing the package. When going onto subsequent new lines it is recommended to do tabs of 4 spaces to maintain readability.

Now we get into the depends and imports lines. Both allow the package to use other libraries in the code. The difference between these two lines is that depends will load the specified libraries into the R environment. In contrast the import line will only download the given libraries. The import line is useful because it allows fewer libraries to be loaded into the user’s R environment and makes the package a better citizen. In the above example the only dependency loaded in is the R language. Meanwhile, when this package is downloaded it will also download the pryr, magick, and methods libraries.

Next is the license which should be fairly obvious. If you wanted to trademark or patent your code, this is how you will tell people. My package is listed as CC0 for now.

The final lines are more technical and generally should not be touched. You have how the code is encoded, LazyData which manages memory in your package, and in the last line it lists the version of oxygen that I am using. For this package I am using roxygen to help with documenting my work.

The description file is deceivingly easy to work with and understand, however it can become a mess fairly quickly if you do not spend time to understand it and its conventions.

Links:

GitHub: https://github.com/SimonLiles/protein8k/blob/master/DESCRIPTION