Title: | Creates Assertion Tests |
---|---|
Description: | Offers a comprehensive set of assertion tests to help users validate the integrity of their data. These tests can be used to check for specific conditions or properties within a dataset and help ensure that data is accurate and reliable. The package is designed to make it easy to add quality control checks to data analysis workflows and to aid in identifying and correcting any errors or inconsistencies in data. |
Authors: | Tomer Iwan [aut, cre, cph] |
Maintainer: | Tomer Iwan <[email protected]> |
License: | MIT + file LICENSE |
Version: | 0.6.0 |
Built: | 2024-11-26 04:52:50 UTC |
Source: | https://github.com/cran/vvauditor |
This function asserts that the values in a specified column of a data frame are of Date type.
It uses the checkmate::assert_date
function to perform the assertion.
assert_date_named(column, df, prefix_column = NULL, ...)
assert_date_named(column, df, prefix_column = NULL, ...)
column |
A character vector or string with the column name to be tested. |
df |
The data frame that contains the column. |
prefix_column |
A character string that will be prepended to the column name in the assertion message. Default is NULL. |
... |
Additional parameters are passed to the |
None
This function asserts that the values in a specified column of a data frame are logical.
It uses the checkmate::assert_logical
function to perform the assertion.
assert_logical_named(column, df, prefix_column = NULL, ...)
assert_logical_named(column, df, prefix_column = NULL, ...)
column |
A character vector or string with the column name to be tested. |
df |
The data frame that contains the column. |
prefix_column |
A character string that will be prepended to the column name in the assertion message. Default is NULL. |
... |
Additional parameters are passed to the |
None
# Create a data frame df <- data.frame(a = c(TRUE, FALSE, TRUE, FALSE), b = c(1, 2, 3, 4)) # Assert that the values in column "a" are logical assert_logical_named("a", df)
# Create a data frame df <- data.frame(a = c(TRUE, FALSE, TRUE, FALSE), b = c(1, 2, 3, 4)) # Assert that the values in column "a" are logical assert_logical_named("a", df)
This function asserts that there are no duplicate rows in the specified columns of a data frame.
It groups the data frame by the specified columns, counts the number of unique values for each group, and checks if there are any groups with more than one row.
If there are, it prints an error message and stops the execution (unless assertion_fail
is set to "warn").
assert_no_duplicates_in_group(df, group_vars, assertion_fail = "stop")
assert_no_duplicates_in_group(df, group_vars, assertion_fail = "stop")
df |
A data frame. |
group_vars |
A character vector of column names. |
assertion_fail |
A character string indicating the action to take if the assertion fails. Can be "stop" (default) or "warn". |
The input data frame.
This function asserts a message based on the type specified. It can either push the message to an AssertCollection, print a warning, or stop execution with an error message.
assertion_message(message, assertion_fail = "stop")
assertion_message(message, assertion_fail = "stop")
message |
A character string representing the message to be asserted. |
assertion_fail |
A character string indicating the action to take if the assertion fails. Can be an AssertCollection, "warning", or "stop" (default). |
None
This function calculates the percentage of each category in a given data vector and returns the top 10 categories along with their percentages. If the data vector is of Date class, it is converted to POSIXct. If the sum of the percentages is not 100%, an "Other" category is added to make up the difference, but only if the number of unique values exceeds 10. If the data vector is of POSIXct class and the smallest percentage is less than 1%, the function returns "Not enough occurrences."
calculate_category_percentages(data_vector)
calculate_category_percentages(data_vector)
data_vector |
A vector of categorical data. |
A character string detailing the top 10 categories and their percentages, or a special message indicating not enough occurrences or unsupported data type.
# Example with a character vector data_vector <- c("cat", "dog", "bird", "cat", "dog", "cat", "other") calculate_category_percentages(data_vector) # Example with a Date vector data_vector <- as.Date(c("2020-01-01", "2020-01-02", "2020-01-03")) calculate_category_percentages(data_vector)
# Example with a character vector data_vector <- c("cat", "dog", "bird", "cat", "dog", "cat", "other") calculate_category_percentages(data_vector) # Example with a Date vector data_vector <- as.Date(c("2020-01-01", "2020-01-02", "2020-01-03")) calculate_category_percentages(data_vector)
Check whether two dataframes have intersecting column names.
check_double_columns(x, y, connector = NULL)
check_double_columns(x, y, connector = NULL)
x |
Data frame x. |
y |
Data frame y. |
connector |
The connector columns as strings. Also possible as vector. |
Message informing about overlap in columns between the dataframes.
Other tests:
check_no_duplicates_in_group()
,
check_numeric_or_integer_type()
,
check_posixct_type()
,
duplicates_in_column()
,
test_all_equal()
check_double_columns(mtcars, iris)
check_double_columns(mtcars, iris)
This function checks if there are any duplicate rows in the specified columns of a data frame. It prints the unique rows and returns a boolean indicating whether the number of rows in the original data frame is the same as the number of rows in the data frame with duplicate rows removed.
check_duplicates(data, columns)
check_duplicates(data, columns)
data |
A data frame. |
columns |
A character vector of column names. |
A logical value indicating whether the number of rows in the original data frame is the same as the number of rows in the data frame with duplicate rows removed.
# Create a data frame df <- data.frame(a = c(1, 2, 3, 1), b = c(4, 5, 6, 4), c = c(7, 8, 9, 7)) # Check for duplicate rows in the first two columns check_duplicates(df, c("a", "b"))
# Create a data frame df <- data.frame(a = c(1, 2, 3, 1), b = c(4, 5, 6, 4), c = c(7, 8, 9, 7)) # Check for duplicate rows in the first two columns check_duplicates(df, c("a", "b"))
This function checks if there are any columns in the provided dataframe that contain only NA values. If such columns exist, their names are added to the provided collection.
check_na_columns(df, collection)
check_na_columns(df, collection)
df |
A dataframe. |
collection |
A list to store the names of the columns with only NA values. |
The updated collection.
# Create a dataframe with some columns containing only NA values df <- data.frame(a = c(1, NA, 3), b = c(NA, NA, NA), c = c(4, 5, 6)) collection <- checkmate::makeAssertCollection() check_na_columns(df, collection)
# Create a dataframe with some columns containing only NA values df <- data.frame(a = c(1, NA, 3), b = c(NA, NA, NA), c = c(4, 5, 6)) collection <- checkmate::makeAssertCollection() check_na_columns(df, collection)
This function checks if there are any duplicate rows in the provided dataframe. If there are duplicate rows, a message is added to the provided collection.
check_no_duplicate_rows(dataframe, collection, unique_columns = NULL)
check_no_duplicate_rows(dataframe, collection, unique_columns = NULL)
dataframe |
A dataframe. |
collection |
A list to store the message if there are duplicate rows. |
unique_columns |
Default is NULL. If provided, these are the columns to check for uniqueness. |
The updated collection.
# Create a dataframe with some duplicate rows dataframe <- data.frame(a = c(1, 1, 2), b = c(2, 2, 3)) collection <- checkmate::makeAssertCollection() check_no_duplicate_rows(dataframe, collection, c("a", "b"))
# Create a dataframe with some duplicate rows dataframe <- data.frame(a = c(1, 1, 2), b = c(2, 2, 3)) collection <- checkmate::makeAssertCollection() check_no_duplicate_rows(dataframe, collection, c("a", "b"))
This function checks if there is exactly one row per group in the provided dataframe. If there are multiple rows per group, the assertion fails.
check_no_duplicates_in_group( dataframe, group_variables = NULL, assertion_fail = "stop" )
check_no_duplicates_in_group( dataframe, group_variables = NULL, assertion_fail = "stop" )
dataframe |
The dataframe to be checked. |
group_variables |
The group variables as a character vector. The default is NULL. |
assertion_fail |
How the function reacts to a failure. This can be a "warning", where only a warning is given on the failure, or a "stop", where the function execution is stopped and the message is displayed, or an "AssertCollection", where the failure message is added to an assertion collection. |
Other assertions:
check_numeric_or_integer_type()
,
check_posixct_type()
Other tests:
check_double_columns()
,
check_numeric_or_integer_type()
,
check_posixct_type()
,
duplicates_in_column()
,
test_all_equal()
# Create a dataframe with some groups having more than one row dataframe <- data.frame(a = c(1, 1, 2), b = c(2, 2, 3), c = c("x", "x", "y")) # Check the uniqueness of rows per group check_no_duplicates_in_group(dataframe)
# Create a dataframe with some groups having more than one row dataframe <- data.frame(a = c(1, 1, 2), b = c(2, 2, 3), c = c("x", "x", "y")) # Check the uniqueness of rows per group check_no_duplicates_in_group(dataframe)
This function checks if there are more than 0 rows in the provided dataframe. If there are 0 rows, a message is added to the provided collection.
check_non_zero_rows(dataframe, collection)
check_non_zero_rows(dataframe, collection)
dataframe |
A dataframe. |
collection |
A list to store the message if there are 0 rows. |
The updated collection.
# Create an empty dataframe dataframe <- data.frame() collection <- checkmate::makeAssertCollection() check_non_zero_rows(dataframe, collection)
# Create an empty dataframe dataframe <- data.frame() collection <- checkmate::makeAssertCollection() check_non_zero_rows(dataframe, collection)
This function checks if the specified column in the provided dataframe has a numeric or integer type.
It uses the checkmate::assert_numeric or checkmate::assert_integer function to perform the assertion,
depending on the value of the field_type
parameter.
check_numeric_or_integer_type( column_name, dataframe, column_prefix = NULL, field_type = "numeric", ... )
check_numeric_or_integer_type( column_name, dataframe, column_prefix = NULL, field_type = "numeric", ... )
column_name |
A character vector or string with the column name to be tested. |
dataframe |
The dataframe that contains the column. |
column_prefix |
Default is NULL. If provided, this text is prepended to the variable name in the assertion message. |
field_type |
Default is "numeric". Specify "integer" to check if the column has an integer type. This parameter must be either "integer" or "numeric". |
... |
The remaining parameters are passed to the function assert_numeric or assert_integer. |
Other assertions:
check_no_duplicates_in_group()
,
check_posixct_type()
Other tests:
check_double_columns()
,
check_no_duplicates_in_group()
,
check_posixct_type()
,
duplicates_in_column()
,
test_all_equal()
# Create a dataframe with a numeric column dataframe <- data.frame(a = c(1, 2, 3)) # Check the numeric type of the 'a' column check_numeric_or_integer_type("a", dataframe)
# Create a dataframe with a numeric column dataframe <- data.frame(a = c(1, 2, 3)) # Check the numeric type of the 'a' column check_numeric_or_integer_type("a", dataframe)
This function checks if the specified column in the provided dataframe has a POSIXct type. It uses the checkmate::assert_posixct function to perform the assertion.
check_posixct_type(column_name, dataframe, column_prefix = NULL, ...)
check_posixct_type(column_name, dataframe, column_prefix = NULL, ...)
column_name |
A character vector or string with the column name to be tested. |
dataframe |
The dataframe that contains the column. |
column_prefix |
Default is NULL. If provided, this text is prepended to the variable name in the assertion message. |
... |
The remaining parameters are passed to the function assert_posixct. |
Other assertions:
check_no_duplicates_in_group()
,
check_numeric_or_integer_type()
Other tests:
check_double_columns()
,
check_no_duplicates_in_group()
,
check_numeric_or_integer_type()
,
duplicates_in_column()
,
test_all_equal()
# Create a dataframe with a POSIXct column dataframe <- data.frame(date = as.POSIXct("2023-10-04")) # Check the POSIXct type of the 'date' column check_posixct_type("date", dataframe)
# Create a dataframe with a POSIXct column dataframe <- data.frame(date = as.POSIXct("2023-10-04")) # Check the POSIXct type of the 'date' column check_posixct_type("date", dataframe)
This function prints the number of rows of a data frame. This function is used to check that rows are not deleted or doubled unless expected.
check_rows(df, name = NULL)
check_rows(df, name = NULL)
df |
The data frame whose rows are to be counted |
name |
The name of the data file (this will be printed) |
A message is printed to the console with the number of rows of the data
check_rows(mtcars)
check_rows(mtcars)
This function checks if there are any columns in the provided dataframe that contain only 0 values. If such columns exist, their names are added to the provided collection.
check_zero_columns(dataframe, collection)
check_zero_columns(dataframe, collection)
dataframe |
A dataframe. |
collection |
A list to store the names of the columns with only 0 values. |
The updated collection.
# Create a dataframe with some columns containing only 0 values dataframe <- data.frame(a = c(0, 0, 0), b = c(1, 2, 3), c = c(0, 0, 0)) collection <- checkmate::makeAssertCollection() check_zero_columns(dataframe, collection)
# Create a dataframe with some columns containing only 0 values dataframe <- data.frame(a = c(0, 0, 0), b = c(1, 2, 3), c = c(0, 0, 0)) collection <- checkmate::makeAssertCollection() check_zero_columns(dataframe, collection)
Function to count the number of values greater than 1 in a vector This function is used in the function Check_columns_for_double_rows to count duplicate values.
count_more_than_1(x)
count_more_than_1(x)
x |
The vector to test |
Number of values greater than 1.
count_more_than_1(c(1, 1, 4))
count_more_than_1(c(1, 1, 4))
This function creates a summary statistics table for a dataframe, providing insights into the nature of the data contained within. It includes detailed statistics for each column, such as column types, missing value percentages, minimum and maximum values for numeric columns, patterns for character columns, uniqueness of identifiers, and distributions.
create_dataset_summary_table(df_input)
create_dataset_summary_table(df_input)
df_input |
A dataframe for which to create a summary statistics table. |
A tibble with comprehensive summary statistics for each column in the input dataframe.
Deletes columns whose name is NA or whose name is empty
drop_na_column_names(x)
drop_na_column_names(x)
x |
dataframe |
dataframe without columns that are NA
Searches for duplicates in a data frame column.
duplicates_in_column(df, col)
duplicates_in_column(df, col)
df |
Data frame. |
col |
Column name. |
Rows containing duplicated values.
Other tests:
check_double_columns()
,
check_no_duplicates_in_group()
,
check_numeric_or_integer_type()
,
check_posixct_type()
,
test_all_equal()
duplicates_in_column(mtcars, "mpg")
duplicates_in_column(mtcars, "mpg")
This function identifies common column names between multiple data frames. It takes a variable number of data frames as input and returns a character vector containing the common column names.
find_common_columns(...)
find_common_columns(...)
... |
A variable length list of data frames. |
A character vector of column names found in common between all data frames.
df1 <- data.frame(a = c(1, 2, 3), b = c(4, 5, 6)) df2 <- data.frame(a = c(7, 8, 9), b = c(10, 11, 12), c = c(13, 14, 15)) common_columns <- find_common_columns(df1, df2) print(common_columns)
df1 <- data.frame(a = c(1, 2, 3), b = c(4, 5, 6)) df2 <- data.frame(a = c(7, 8, 9), b = c(10, 11, 12), c = c(13, 14, 15)) common_columns <- find_common_columns(df1, df2) print(common_columns)
Find the maximum numeric value in a vector, ignoring non-numeric values
find_maximum_value(numeric_vector)
find_maximum_value(numeric_vector)
numeric_vector |
A vector from which to find the maximum numeric value. |
The maximum numeric value in the input vector, or NA if none exist.
# Find the maximum of a numeric vector find_maximum_value(c(3, 1, 4, 1, 5, 9)) # Returns 9 # Find the maximum of a mixed vector with non-numeric values find_maximum_value(c(3, 1, 4, "two", 5, 9)) # Returns 9 # Attempt to find the maximum of a vector with only non-numeric values find_maximum_value(c("one", "two", "three")) # Returns NA
# Find the maximum of a numeric vector find_maximum_value(c(3, 1, 4, 1, 5, 9)) # Returns 9 # Find the maximum of a mixed vector with non-numeric values find_maximum_value(c(3, 1, 4, "two", 5, 9)) # Returns 9 # Attempt to find the maximum of a vector with only non-numeric values find_maximum_value(c("one", "two", "three")) # Returns NA
Find the minimum numeric value in a vector, ignoring non-numeric values
find_minimum_value(numeric_vector)
find_minimum_value(numeric_vector)
numeric_vector |
A vector from which to find the minimum numeric value. |
The minimum numeric value in the input vector, or NA if none exist.
# Find the minimum of a numeric vector find_minimum_value(c(3, 1, 4, 1, 5, 9)) # Returns 1 # Find the minimum of a mixed vector with non-numeric values find_minimum_value(c(3, 1, 4, "two", 5, 9)) # Returns 1 # Attempt to find the minimum of a vector with only non-numeric values find_minimum_value(c("one", "two", "three")) # Returns NA
# Find the minimum of a numeric vector find_minimum_value(c(3, 1, 4, 1, 5, 9)) # Returns 1 # Find the minimum of a mixed vector with non-numeric values find_minimum_value(c(3, 1, 4, "two", 5, 9)) # Returns 1 # Attempt to find the minimum of a vector with only non-numeric values find_minimum_value(c("one", "two", "three")) # Returns NA
Function to search for a pattern in R scripts.
find_pattern_r(pattern, path = ".", case.sensitive = TRUE, comments = FALSE)
find_pattern_r(pattern, path = ".", case.sensitive = TRUE, comments = FALSE)
pattern |
Pattern to search |
path |
Directory to search in |
case.sensitive |
Whether pattern is case sensitive or not |
comments |
whether to search in commented lines |
Dataframe containing R script paths
This function computes summary statistics such as quartiles, mean, and standard deviation for a numeric vector.
get_distribution_statistics(data_vector)
get_distribution_statistics(data_vector)
data_vector |
A numeric vector for which to compute summary statistics. |
A character string describing the summary statistics of the input vector.
# Compute summary statistics for a numeric vector data_vector <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) get_distribution_statistics(data_vector)
# Compute summary statistics for a numeric vector data_vector <- c(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) get_distribution_statistics(data_vector)
Retrieve the class of the first element of a vector
get_first_element_class(input_vector)
get_first_element_class(input_vector)
input_vector |
A vector whose first element's class is to be retrieved. |
The class of the first element of the input vector.
# Get the class of the first element in a numeric vector get_first_element_class(c(1, 2, 3)) # Returns "numeric" # Get the class of the first element in a character vector get_first_element_class(c("apple", "banana", "cherry")) # Returns "character"
# Get the class of the first element in a numeric vector get_first_element_class(c(1, 2, 3)) # Returns "numeric" # Get the class of the first element in a character vector get_first_element_class(c("apple", "banana", "cherry")) # Returns "character"
A function to determine what kind of values are present in columns.
get_values(df, column)
get_values(df, column)
df |
The dataframe |
column |
Column to get values from. |
The class of the column values
get_values(mtcars, "mpg")
get_values(mtcars, "mpg")
This function identifies potential join pairs between two data frames based on the overlap between the distinct values in their columns. It returns a data frame showing the possible join pairs.
identify_join_pairs(..., similarity_cutoff = 0.2)
identify_join_pairs(..., similarity_cutoff = 0.2)
... |
A list of two data frames. |
similarity_cutoff |
The minimal percentage of overlap between the distinct values in the columns. |
A data frame showing candidate join pairs.
identify_join_pairs(iris, iris3)
identify_join_pairs(iris, iris3)
This function identifies outliers in a specified column of a data frame. It returns a tibble containing the unique values, tally, and whether it is an outlier or not.
identify_outliers(df, var)
identify_outliers(df, var)
df |
The data frame. |
var |
The column to check for outliers. |
A tibble containing the unique values, tally, and whether each value is an outlier or not.
df <- data.frame(a = c(1, 2, 3, 100, 101), b = c(4, 5, 6, 7, 8), c = c(7, 8, 9, 100, 101)) outliers <- identify_outliers(df, "a") print(outliers)
df <- data.frame(a = c(1, 2, 3, 100, 101), b = c(4, 5, 6, 7, 8), c = c(7, 8, 9, 100, 101)) outliers <- identify_outliers(df, "a") print(outliers)
Check if a column in a dataframe has unique values
is_unique_column(column_name, data_frame)
is_unique_column(column_name, data_frame)
column_name |
The name of the column to check for uniqueness. |
data_frame |
A dataframe containing the column to check. |
TRUE
if the column has unique values, FALSE
otherwise.
# Create a dataframe with a unique ID column data_frame <- tibble::tibble( id = c(1, 2, 3, 4, 5), value = c("a", "b", "c", "d", "e") ) is_unique_column("id", data_frame) # Returns TRUE # Create a dataframe with duplicate values in the ID column data_frame <- tibble::tibble( id = c(1, 2, 3, 4, 5, 1), value = c("a", "b", "c", "d", "e", "a") ) is_unique_column("id", data_frame) # Returns FALSE
# Create a dataframe with a unique ID column data_frame <- tibble::tibble( id = c(1, 2, 3, 4, 5), value = c("a", "b", "c", "d", "e") ) is_unique_column("id", data_frame) # Returns TRUE # Create a dataframe with duplicate values in the ID column data_frame <- tibble::tibble( id = c(1, 2, 3, 4, 5, 1), value = c("a", "b", "c", "d", "e", "a") ) is_unique_column("id", data_frame) # Returns FALSE
Print the complete cases of the data.
md_complete_cases(data, digits = 1)
md_complete_cases(data, digits = 1)
data |
The data frame. |
digits |
Default: 1. number of digits for rounding. |
Message with the number of rows, number of rows with missing values and the percentage of complete rows.
# example code md_complete_cases(iris) iris$Sepal.Length[5] <- NA_character_ md_complete_cases(iris)
# example code md_complete_cases(iris) iris$Sepal.Length[5] <- NA_character_ md_complete_cases(iris)
This function constructs a regex pattern for matching the content of a parameter in a function.
It uses the base::paste0
function to construct the regex pattern.
regex_content_parameter(parameter)
regex_content_parameter(parameter)
parameter |
The parameter whose value is to be searched in a function. |
A regex pattern as a character string.
# Create a parameter name parameter <- "my_parameter" # Construct a regex pattern for matching the content of the parameter pattern <- regex_content_parameter(parameter)
# Create a parameter name parameter <- "my_parameter" # Construct a regex pattern for matching the content of the parameter pattern <- regex_content_parameter(parameter)
This function generates a regular expression for time based on the input format.
regex_time(format = "hh:mm")
regex_time(format = "hh:mm")
format |
The format of the time. Possible values are:
|
A regular expression.
regex_time("hh:mm") regex_time("h:m") regex_time("hh:mm:ss") regex_time("h:m:s") regex_time("hh:mm:ss AM/PM") regex_time("h:m:s AM/PM")
regex_time("hh:mm") regex_time("h:m") regex_time("hh:mm:ss") regex_time("h:m:s") regex_time("hh:mm:ss AM/PM") regex_time("h:m:s AM/PM")
This function generates a regular expression for year date based on the input format.
regex_year_date(format = "yyyy")
regex_year_date(format = "yyyy")
format |
The format of the year date. Possible values are:
|
A regular expression.
regex_year_date("yyyy") regex_year_date("yyyy-MM-dd") regex_year_date("yyyy/MM/dd") regex_year_date("yyyy.MM.dd") regex_year_date("yyyy-M-d") regex_year_date("yyyy/M/d") regex_year_date("yyyy.M.d") regex_year_date("yyyy-MM-dd HH:mm:ss") regex_year_date("yyyy/MM/dd HH:mm:ss") regex_year_date("yyyy-MM-dd HH:mm") regex_year_date("yyyy/MM/dd HH:mm")
regex_year_date("yyyy") regex_year_date("yyyy-MM-dd") regex_year_date("yyyy/MM/dd") regex_year_date("yyyy.MM.dd") regex_year_date("yyyy-M-d") regex_year_date("yyyy/M/d") regex_year_date("yyyy.M.d") regex_year_date("yyyy-MM-dd HH:mm:ss") regex_year_date("yyyy/MM/dd HH:mm:ss") regex_year_date("yyyy-MM-dd HH:mm") regex_year_date("yyyy/MM/dd HH:mm")
This function removes duplicate values and NA values from the input.
It first removes NA values from the input using the na.omit
function from the stats
package.
Then it removes duplicate values from the result using the unique
function.
remove_duplicates_and_na(input)
remove_duplicates_and_na(input)
input |
A vector or data frame. |
A vector or data frame with duplicate values and NA values removed.
# Create a vector with duplicate values and NA values input <- c(1, 2, NA, 2, NA, 3, 4, 4, NA, 5) # Remove duplicate values and NA values output <- remove_duplicates_and_na(input) print(output)
# Create a vector with duplicate values and NA values input <- c(1, 2, NA, 2, NA, 3, 4, 4, NA, 5) # Remove duplicate values and NA values output <- remove_duplicates_and_na(input) print(output)
retrieve_function_calls
retrieve_function_calls(script_name)
retrieve_function_calls(script_name)
script_name |
The script to search functions in |
dataframe
Retrieves functions and their corresponding packages used in a given script.
retrieve_functions_and_packages(path)
retrieve_functions_and_packages(path)
path |
The complete path of the script. |
Used_functions
Retrieve packages that are loaded in a script
retrieve_package_usage(script_name)
retrieve_package_usage(script_name)
script_name |
The path to the R script |
dataframe
retrieve_sourced_scripts
retrieve_sourced_scripts(script_name)
retrieve_sourced_scripts(script_name)
script_name |
The main script to search |
dataframe
retrieve_string_assignments
retrieve_string_assignments(script_name)
retrieve_string_assignments(script_name)
script_name |
The script to search objects in |
dataframe
This function returns a message indicating whether an assertion test has passed or failed. An "assertion collection" from the checkmate package must be provided. The message can be returned as an error or a warning. For some assertions, only warnings are allowed, as an error would stop the script from running. This is done for the following assertions: percentage missing values, duplicates, subset, and set_equal.
return_assertions_message( collection, collection_name, fail = "stop", silent = FALSE, output_map = NULL )
return_assertions_message( collection, collection_name, fail = "stop", silent = FALSE, output_map = NULL )
collection |
An object with the class "AssertCollection". |
collection_name |
The name of the collection. This name is mentioned in the messages. |
fail |
"stop" or "warning". If the assertions fail, an error is returned and the script output is stopped. If "warning", only a warning is returned. |
silent |
If FALSE (default), the success message is printed in the console. If TRUE, it is not shown. |
output_map |
A map, like 1. Read data, where the file is stored. |
The message indicating whether the assertion test has passed or failed.
Detect string in file
str_detect_in_file(file, pattern, only_comments = FALSE, collapse = FALSE)
str_detect_in_file(file, pattern, only_comments = FALSE, collapse = FALSE)
file |
Path to file. |
pattern |
Pattern to match. |
only_comments |
default FALSE. Whether to only search in commented lines. |
collapse |
default: FALSE: search file line by line. If true, then pattern is search in the entire file at once after collapsing. (only_comments does not work when collapse is set to TRUE) |
Boolean whether pattern exists in file.
Test whether all values in a vector are equal.
test_all_equal(x, na.rm = FALSE)
test_all_equal(x, na.rm = FALSE)
x |
Vector to test. |
na.rm |
default: FALSE. exclude NAs from the test. |
Boolean result of the test
Other tests:
check_double_columns()
,
check_no_duplicates_in_group()
,
check_numeric_or_integer_type()
,
check_posixct_type()
,
duplicates_in_column()
test_all_equal(c(5, 5, 5)) test_all_equal(c(5, 6, 3))
test_all_equal(c(5, 5, 5)) test_all_equal(c(5, 6, 3))
Check if parsed variable is a unique identifier. This function was adapted from: Source: https://edwinth.github.io/blog/unique_id/
unique_id(x, ...)
unique_id(x, ...)
x |
vector or dataframe. |
... |
optional variables, e.g. name of column or a vector of names. |
Boolean whether variable is a unique identifier.
unique_id(iris, Species) mtcars$name <- rownames(mtcars) unique_id(mtcars, name)
unique_id(iris, Species) mtcars$name <- rownames(mtcars) unique_id(mtcars, name)