Skip to contents

Introduction

Clear documentation of data objects allows users to quickly understand the structure, variables, and intended purpose of each dataset without having to inspect the raw contents. This is especially important in packages where data evolves over time or supports multiple analysis pipelines. Well-documented data reduces onboarding time, minimizes errors, and promotes consistent use and interpretation across workflows.

DPR2 offers features that automatically generate documentation for .rda data objects stored in a package’s data/ folder. This includes creating .R files for each object under R/, as well as .Rd help files in man/ using roxygen2. Descriptive details are added based on the object’s structure. For example, the number of rows and columns are shown for data frames or matrices, and the length is shown for vectors and lists.

Below, we explain how this works, demonstrate its behavior on example objects, and cover important edge cases. Let’s begin by showing a directory skeleton of an example DPR2 package TestPKG prior to rendering to showcase the absence of the man/ and R/ folders.

TestPKG
├── data
├── datapackager.yml
├── DESCRIPTION
├── inst
│   ├── data_digest
│   ├── extdata
│   └── to_build
│       ├── objects
│       └── scripts
│           └── mtcars.R_
├── NAMESPACE
└── processing
    └── mtcars.R

We now walk through an example using the built-in mtcars dataset. This will show how DPR2 documents both a data frame and a structured list object.

Automated, simple documentation with DPR2

DPR2 automatically generates documentation for each .rda object in the data/ folder upon calling dpr_render(). The .R and .Rd files are created by examining the object’s structure, including its class, dimensions, and element names. Field descriptions are inferred from column names in data frames or element names in lists.

DPR2 leverages dpr_compare_data_digest() to compare the checksums of objects in the data/ directory with their recorded digests. This allows DPR2 to update .R documentation files only when changes are detected in the underlying data. By default, DPR2 will overwrite .R and .Rd files if the corresponding data object has changed, as determined by its checksum. If the object has not changed, DPR2 will skip re-documentation.

Below we walk through an example showcasing DPR2’s default documentation behavior. We define and save two objects: an mtcars_df data frame that has an added efficiency column, and a mtcars_list list containing the raw data, summary statistics, column classes, and a correlation matrix. These are saved to the data/ folder using dpr_save().

library(DPR2)

data('mtcars')
mtcars_df <- mtcars
mtcars_df$efficiency <- mtcars$mpg / mtcars$hp

mtcars_list <- list(
  raw_data = mtcars,
  summary = summary(mtcars),
  column_classes = sapply(mtcars, class),
  correlation_matrix = cor(mtcars)
  )
dpr_save('mtcars_df')
dpr_save('mtcars_list')
str(mtcars_df)
#> 'data.frame':    32 obs. of  12 variables:
#>  $ mpg       : num  21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>  $ cyl       : num  6 6 4 6 8 6 8 4 4 6 ...
#>  $ disp      : num  160 160 108 258 360 ...
#>  $ hp        : num  110 110 93 110 175 105 245 62 95 123 ...
#>  $ drat      : num  3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#>  $ wt        : num  2.62 2.88 2.32 3.21 3.44 ...
#>  $ qsec      : num  16.5 17 18.6 19.4 17 ...
#>  $ vs        : num  0 0 1 1 0 1 0 1 1 1 ...
#>  $ am        : num  1 1 1 0 0 0 0 0 0 0 ...
#>  $ gear      : num  4 4 4 3 3 3 3 4 4 4 ...
#>  $ carb      : num  4 4 1 1 2 1 4 2 2 4 ...
#>  $ efficiency: num  0.191 0.191 0.245 0.195 0.107 ...
str(mtcars_list)
#> List of 4
#>  $ raw_data          :'data.frame':  32 obs. of  11 variables:
#>   ..$ mpg : num [1:32] 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
#>   ..$ cyl : num [1:32] 6 6 4 6 8 6 8 4 4 6 ...
#>   ..$ disp: num [1:32] 160 160 108 258 360 ...
#>   ..$ hp  : num [1:32] 110 110 93 110 175 105 245 62 95 123 ...
#>   ..$ drat: num [1:32] 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
#>   ..$ wt  : num [1:32] 2.62 2.88 2.32 3.21 3.44 ...
#>   ..$ qsec: num [1:32] 16.5 17 18.6 19.4 17 ...
#>   ..$ vs  : num [1:32] 0 0 1 1 0 1 0 1 1 1 ...
#>   ..$ am  : num [1:32] 1 1 1 0 0 0 0 0 0 0 ...
#>   ..$ gear: num [1:32] 4 4 4 3 3 3 3 4 4 4 ...
#>   ..$ carb: num [1:32] 4 4 1 1 2 1 4 2 2 4 ...
#>  $ summary           : 'table' chr [1:6, 1:11] "Min.   :10.40  " "1st Qu.:15.43  " "Median :19.20  " "Mean   :20.09  " ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:6] "" "" "" "" ...
#>   .. ..$ : chr [1:11] "     mpg" "     cyl" "     disp" "      hp" ...
#>  $ column_classes    : Named chr [1:11] "numeric" "numeric" "numeric" "numeric" ...
#>   ..- attr(*, "names")= chr [1:11] "mpg" "cyl" "disp" "hp" ...
#>  $ correlation_matrix: num [1:11, 1:11] 1 -0.852 -0.848 -0.776 0.681 ...
#>   ..- attr(*, "dimnames")=List of 2
#>   .. ..$ : chr [1:11] "mpg" "cyl" "disp" "hp" ...
#>   .. ..$ : chr [1:11] "mpg" "cyl" "disp" "hp" ...

After calling dpr_render(), DPR2 generates .R files under R/ and .Rd files under man/.

TestPKG
├── data
│   ├── mtcars_df.rda
│   └── mtcars_list.rda
├── man
│   ├── mtcars_df.Rd
│   └── mtcars_list.Rd
├── processing
│   └── mtcars.R
└── R
    ├── mtcars_df.R
    └── mtcars_list.R

Below is the content for mtcars_df.R and mtcars_list.R files.

# This is data documentation created by DPR2. Do not delete this line.

#' mtcars_df
#'
#' A detailed description of the data
#'
#' @format A data.frame with 32 rows and 12 columns with the following fields:
#' \describe{
#'   \item{mpg}{numeric}{}
#'   \item{cyl}{numeric}{}
#'   \item{disp}{numeric}{}
#'   \item{hp}{numeric}{}
#'   \item{drat}{numeric}{}
#'   \item{wt}{numeric}{}
#'   \item{qsec}{numeric}{}
#'   \item{vs}{numeric}{}
#'   \item{am}{numeric}{}
#'   \item{gear}{numeric}{}
#'   \item{carb}{numeric}{}
#'   \item{efficiency}{numeric}{}
#' }
#' @source Generated from script _________________
#' @seealso
#' \link{}
"mtcars_df"
# This is data documentation created by DPR2. Do not delete this line.

#' mtcars_list
#'
#' A detailed description of the data
#'
#' @format A list with 4 elements:
#' \describe{
#'   \item{raw_data}{data.frame}{}
#'   \item{summary}{table}{}
#'   \item{column_classes}{character}{}
#'   \item{correlation_matrix}{matrix}{}
#' }
#' @source Generated from script _________________
#' @seealso
#' \link{}
"mtcars_list"

Users can now access documentation with ?mtcars_df or ?mtcars_list, just like any standard R data package.

Advanced documentation with DPR2

Users are able to modify the .R file templates generated by DPR2 and provide further details on their data objects such as context, descriptions, and references. In the example below, we update the @source tag to link the object back to its processing script.

mtcars_block <- readLines(file.path(path, "R", "mtcars_df.R"))
mtcars_block[grepl("@source", mtcars_block)] <- "#' @source Generated from script mtcars.R"
writeLines(mtcars_block, file.path(path, "R", "mtcars_df.R"))

roxygen2::roxygenize(path)

We now display the updated .Rd help file for mtcars_df:

% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/mtcars_df.R
\docType{data}
\name{mtcars_df}
\alias{mtcars_df}
\title{mtcars_df}
\format{
A data.frame with 32 rows and 12 columns with the following fields:
\describe{
  \item{mpg}{numeric}{}
  \item{cyl}{numeric}{}
  \item{disp}{numeric}{}
  \item{hp}{numeric}{}
  \item{drat}{numeric}{}
  \item{wt}{numeric}{}
  \item{qsec}{numeric}{}
  \item{vs}{numeric}{}
  \item{am}{numeric}{}
  \item{gear}{numeric}{}
  \item{carb}{numeric}{}
  \item{efficiency}{numeric}{}
}
}
\source{
Generated from script mtcars.R
}
\usage{
mtcars_df
}
\description{
A detailed description of the data
}
\seealso{
\link{}
}
\keyword{datasets}

Since existing documentation files are overwritten when a checksum change is detected, users are encouraged to separately save any custom edits so they can be copied into the updated version if needed.

Controlling what gets documented

As noted earlier, DPR2 uses dpr_compare_data_digest() to track changes to objects in the data/ directory and determine how documentation should be handled:

  • If an existing .rda file has changed, the corresponding .R and .Rd files are automatically overwritten.
  • If the .rda file is unchanged, DPR2 preserves existing documentation and does not overwrite .R or .Rd files.
  • If an object is no longer present in the data/ directory, DPR2 deletes the corresponding .R and .Rd files during rendering.

This behavior keeps the documentation in sync with the underlying data while avoiding unnecessary overwrites when no changes are detected.

If no new objects have been added and no existing objects have changed, DPR2 outputs a message indicating that no data documentation was generated. This lets users confirm that documentation is up to date without triggering unnecessary updates.

For example, after re-running dpr_render() with no changes to the saved objects, users will see the following:

## No new data object documentation created, as no objects have been modified and all objects are documented in `.R`.

In contrast, in the example below, mtcars_df was removed from the processing script. DPR2 detects that the object is no longer present and deletes both R/mtcars_df.R and man/mtcars_df.Rd the next time dpr_render() is called.

## No new data object documentation created, as no objects have been modified and all objects are documented in `.R`.
library(DPR2)
data('mtcars')
mtcars_list <- list(
  raw_data = mtcars,
  summary = summary(mtcars),
  column_classes = sapply(mtcars, class),
  correlation_matrix = cor(mtcars)
)
dpr_save('mtcars_list')
TestPKG
├── R
   └── mtcars_list.R
├── data
   └── mtcars_list.rda
├── man
   └── mtcars_list.Rd
└── processing
    └── mtcars.R

Finally, it’s important to note that DPR2 will not write data documentation for .rda files that have more than one object saved, or if the object in the .rda file does not match the .rda file name. This is due to the unclear mapping between the filename and object name. Warning messages will be outputted to alert the user of this. To enforce best practices, use dpr_save(), not save(), to write RDA files.

library(DPR2)

data('mtcars')
mtcars_df <- mtcars
mtcars_df$efficiency <- mtcars$mpg / mtcars$hp

# next line is not a best practice! Use dpr_save() instead.
save('mtcars_df', file=dpr_path('data', 'my_df.rda'))
dpr_save('mtcars')
## Warning in generate_all_docs(path = path): 'my_df.rda' does not match data
## object name 'mtcars_df'. Will skip writing documentation for it.
## No new data object documentation created, as no objects have been modified and all objects are documented in `.R`.