Every R package has some essential components:
Documentation, stored as .Rd
files inside the man
folder
Functions, stored as .R
files inside the R
folder
Package metadata stored inside a flat file named DESCRIPTION
A short script for exporting all functions that are part of the package, named as NAMESPACE
Various optional files including: CITATION
, help
, doc
(replacement for man
) and data
.
Let’s examine contents of some of the packages we have already installed.
In order to find where files for a given package are located, use the following syntax:
find.package("package_name")
ggplot2
package:find.package("ggplot2")
1] "/Library/Frameworks/R.framework/Versions/4.0/Resources/library/ggplot2" [
setwd("/Library/Frameworks/R.framework/Versions/4.0/Resources/library/ggplot2")
list.files()
1] "CITATION" "data" "DESCRIPTION" "doc" "help"
[6] "html" "INDEX" "LICENSE" "Meta" "NAMESPACE"
[11] "NEWS.md" "R" [
R
folder:setwd("R")
list.files()
1] "ggplot2" "ggplot2.rdb" "ggplot2.rdx" [
You can check the contents of the ggplot2
file using system("cat FILENAME")
, but it does not contain any functions. The other two files are binary meaning that they are only for R’s internal use.
To check what functions are available within ggplot2, you can cat the NAMESPACE file:
setwd("../")
system("cat NAMESPACE")
# Generated by roxygen2: do not edit by hand
S3method("$",ggproto)
S3method("$",ggproto_parent)
S3method("$<-",uneval)
S3method("+",gg)
...importFrom(stats,setNames)
importFrom(tibble,tibble)
importFrom(utils,.DollarNames)
As you can see, it’s a very long list of functions. We have truncated the view above.
Feel free to explore what is contained within each folder of the package. Some files will be binary and could not be viewed.
Try another package of your choice to learn where and how its files are stored (3 min).
There are three main ways you can publish your package i.e. make it available widely for anyone to use:
CRAN: The Comprehensive R Archive Network is the most widely used repository for R packages. Anyone can submit a package for review and publishing on CRAN. The process may take some time. Currently the repository has over 17,000 packages available.
Bioconductor: is a specialized repository that contains packages of relevance to biologists. Much of the genomic data analysis packages are now distributed through bioconductor.
Github: Cutting edge versions of packages can be obtained from Github. Often these packages have stable versions on CRAN, but more recent versions that are not yet archived on CRAN can be accessed on github. We have already visited how to install R packages from Github on several occasions.
One of the motivations behind creating a package is to be able to use a given function with ease, instead of having to hunt down the relevant code to repeat the analysis. Let’s walk through a few functions to see how they work:
function(n){
printnum <- readline("How many numbers do you want to print? ")
n <-print(1:n)
}
printnum()
25
How many numbers do you want to print?
1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 [
function(tempF){
f2c <- (tempF - 32) * 5/9
tempC <-return(tempC)
}
f2c(32)
1] 0
[
f2c(88)
1] 31.11111 [
function(allele1, allele2, popsize){
popdata <- readline("Provide name of the first allele - one alphabet only: ")
allele1 <- readline("Provide name of the second allele: ")
allele2 <- readline("How many individuals in your population?: ")
popsize <-
print("Possible diploid genotypes are:")
print(paste0("First Homozygote: ", allele1, allele1, sep=""))
print(paste0("Heterozygote: ", allele1, allele2, sep=""))
print(paste0("Second Homozygote: ", allele2, allele2, sep=""))
paste(allele1, allele1, sep="")
homoz1 <- paste(allele1, allele2, sep="")
hetz <- paste(allele2, allele2, sep="")
homoz2 <-
sample(c(homoz1, hetz, homoz2), popsize, replace=TRUE)
pop <-
print("Your sampled population is stored in object 'pop' and also printed below:")
print(pop)
}
- one alphabet only: A
Provide name of the first allele : D
Provide name of the second allelein your population?: 100
How many individuals 1] "Possible diploid genotypes are:"
[1] "First Homozygote: AA"
[1] "Heterozygote: AD"
[1] "Second Homozygote: DD"
[1] "Your sampled population is stored in object 'pop' and also printed below:"
[1] "AD" "DD" "DD" "DD" "DD" "DD" "AD" "AA" "AD" "AA" "DD" "AA" "AD" "DD" "AD"
[16] "DD" "AA" "AA" "AA" "AA" "AD" "DD" "AD" "AD" "AD" "DD" "AA" "DD" "DD" "AA"
[31] "AD" "DD" "AD" "DD" "AD" "DD" "AD" "AD" "AD" "AA" "DD" "DD" "AA" "AA" "DD"
[46] "DD" "DD" "AA" "DD" "DD" "DD" "AA" "DD" "AD" "AA" "AA" "AD" "AA" "DD" "AD"
[61] "DD" "AD" "DD" "DD" "AA" "DD" "DD" "AD" "AD" "AA" "AA" "DD" "DD" "AA" "AA"
[76] "AD" "AD" "AA" "AD" "AA" "AA" "AD" "AA" "AD" "AD" "AD" "AA" "AA" "AA" "AA"
[91] "DD" "AD" "AD" "AD" "DD" "AA" "AA" "DD" "AA" "AD" [
You will need the following libraries to generate a package. Go ahead and get them unless you have them already.
devtools
roxygen2
setwd("~/Github")
::create("popdata")
devtools
'popdata/'
✔ Creating '/Users/vikram/Dropbox/Github/popdata'
✔ Setting active project to 'R/'
✔ Creating 'DESCRIPTION'
✔ Writing : popdata
Package: What the Package Does (One Line, Title Case)
Title: 0.0.0.9000
Version@R (parsed):
Authors * First Last <first.last@example.com> [aut, cre] (YOUR-ORCID-ID)
: What the package does (one paragraph).
Description: `use_mit_license()`, `use_gpl3_license()` or friends to
License
pick a license: UTF-8
Encoding: true
LazyData: list(markdown = TRUE)
Roxygen: 7.1.1
RoxygenNote'NAMESPACE' ✔ Writing
popdata
in your current location. Let’s go inside the folder and check what’s there.list.files()
1] "DESCRIPTION" "NAMESPACE" "R" [
R
. Create a new file there named popdata.R
.cd ~/Github/popdata/R
vim popdata.R
Add contents of the script from above to the file.
Because this function will be exposed to users (some functions are not, they work behind the scenes), we will need to add an export tag to the function content as follows. This line should appear at the very top of the package.
#' @export
popdata.R
script should look like this now:#' @export
function(allele1, allele2, popsize){
popdata <- readline("Provide name of the first allele - one alphabet only: ")
allele1 <- readline("Provide name of the second allele: ")
allele2 <- readline("How many individuals in your population?: ")
popsize <-
print("Possible diploid genotypes are:")
print(paste0("First Homozygote: ", allele1, allele1, sep=""))
print(paste0("Heterozygote: ", allele1, allele2, sep=""))
print(paste0("Second Homozygote: ", allele2, allele2, sep=""))
paste(allele1, allele1, sep="")
homoz1 <- paste(allele1, allele2, sep="")
hetz <- paste(allele2, allele2, sep="")
homoz2 <-
sample(c(homoz1, hetz, homoz2), popsize, replace=TRUE)
pop <-
print("Your sampled population is stored in object 'pop' and also printed below:")
print(pop)
}
We need to add some useful information for the end user so they know exactly what this function does. When you run the help for this function with ?popdata
, this information will be printed to the screen.
The following contents should go at the top of the function:
#' popdata() function by FirstName LastName
#' This function takes input from the user on allele names and population size
#' It then prints out the genetic variation data
Once you have written and saved all this information to popdata.R
, save and close it.
Now we will ask devtools to generate documentation based on our input.
::document()
devtools
Updating popdata documentation
ℹ Loading popdata
Writing NAMESPACE Writing NAMESPACE
~/Github/popdata/man
folder, you will now see a popdata.Rd
file which contains the help documentation you just wrote.::install()
devtools
for file ‘/Users/vikram/Dropbox/Github/popdata/DESCRIPTION’ ...
✔ checking :
─ preparing ‘popdata’-information
✔ checking DESCRIPTION metafor LF line-endings in source and make files and shell scripts
─ checking for empty or unneeded directories
─ checking 0.0.0.9000.tar.gz’
─ building ‘popdata_
/Library/Frameworks/R.framework/Resources/bin/R CMD INSTALL \
Running /var/folders/8z/8vr45rz94t95_gn426z64f5w0000gn/T//Rtmpqd3Prl/popdata_0.0.0.9000.tar.gz \
--install-tests
* installing to library ‘/Library/Frameworks/R.framework/Versions/4.0/Resources/library’
* installing *source* package ‘popdata’ ...
** using staged installation
** R
** byte-compile and prepare package for lazy loading
** help
*** installing help indices
** building package indices
** testing if installed package can be loaded from temporary location
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (popdata)
That’s it. Your package is now ready to be used.
Check its help menu first:
?popdata
:popdata R Documentation
popdata package
popdata() function by FirstName LastName This function takes input from
the user on allele names and population size It then prints out the
genetic variation data
:
Description
popdata() function by FirstName LastName This function takes input
from the user on allele names and population size It then prints
out the genetic variation data
:
Usage
popdata(allele1, allele2, popsize)
(END)
popdata()
- one alphabet only: M
Provide name of the first allele : N
Provide name of the second allelein your population?: 100
How many individuals 1] "Possible diploid genotypes are:"
[1] "First Homozygote: MM"
[1] "Heterozygote: MN"
[1] "Second Homozygote: NN"
[1] "Your sampled population is stored in object 'pop' and also printed below:"
[1] "NN" "NN" "MN" "MN" "MN" "MM" "NN" "NN" "MM" "NN" "MM" "MM" "MM" "MM" "MM"
[16] "MM" "MN" "MM" "MN" "MN" "MN" "MM" "NN" "MN" "MN" "NN" "NN" "NN" "MM" "MN"
[31] "MN" "MM" "NN" "MN" "MN" "MN" "NN" "MM" "MM" "MM" "MM" "NN" "MM" "NN" "NN"
[46] "MM" "NN" "MN" "NN" "NN" "NN" "MN" "MM" "NN" "MN" "MM" "MN" "MN" "NN" "MN"
[61] "NN" "NN" "NN" "NN" "MN" "MM" "NN" "MM" "MM" "MN" "MN" "MN" "MN" "MN" "MM"
[76] "MN" "MM" "MN" "MM" "MN" "MM" "MN" "NN" "MN" "MN" "MN" "NN" "MM" "MM" "NN"
[91] "MN" "MM" "MM" "MN" "NN" "MN" "MM" "MM" "NN" "MM" [
Usage
above indicates that you can pass options to function in parenthesis, it won’t really work.This part should now be cakewalk for you having gone through git/github multiple times.
Turn ~/Github/popdata
into a git repository
Create a new popdata
repository on Github.com
Configure your local repo, then add, commit and push files as usual.
Share your Github user name on Slack channel so others can access your package.
Another person who wishes to install your package will do the following:
library(devtools)
::install_github("YOUR_USER_NAME/popdata") devtools
Create a new package called allfreq
Modify the code from our popdata package, but now also calculate frequencies of both alleles in the population.
Follow all steps in #3 above and post your package github url on Slack.
The instructor will run through this exercise after your first attempt.