Tidy Function API
Tidy functions are the main function used in a tidy(...)
flow. These are the primary functions to be used when wrangling data. They expect their input to either be a flat list of items.
#
tidyThe main function that starts a tidy flow. Used to chain multiple tidy functions together and to smartly handle working with grouped data. The items it works with must be a flat list.
#
Parametersitems
#
The collection of data to work with, a flat array of items.
...tidyFns
#
Any number of functions can be supplied to tidy which will be called as if in a pipeline: the output of function 1 is the input to function 2.
The typical case is (items: object[]) => object[]
, but groupBy may output something different if you specify an export option. If you do export from groupBy, it must be the last function called in the tidy flow.
#
Usage#
addItems / addRowsAdds items to the end of a collection.
#
ParametersitemsToAdd
#
The items to add to the collection or a function that resolves to items to add given the input set.
#
Usage#
arrange / sortSorts items by the specified keys and comparators.
#
Parameterscomparators
#
A key or set of keys of the item to sort by, or comparator functions that return -1, 0, or 1 if a < b, a == b, a > b respectively. You can mix and match keys and comparator functions when supplying an array.
For convenience, you can flip to descending order for keys by wrapping the key string with the desc(key: string)
function. There is a corresponding asc(key: string)
function for ascending data, but this is the default, so is unnecessary to use.
You can also sort the values for a key in a pre-specified order with the fixedOrder function:
#
Usage#
completeComplete a collection with missing combinations of data, can be useful for zero filling data. This is a convenience function that combines expand, leftJoin, and replaceNully.
#
ParametersexpandKeys
#
The keys to expand the collection to have all combinations of. This can be specified as a single key string, an array of key strings or a key mapping object. The key mapping object maps from keys in the items to either:
{ a: 'a' }
: the key name itself. In this case, the values to use for the combinations will be derived from what is in the data currently.{ a: [1, 2, 3, 4] }
an array of values denoting all possible values for this key, even if they do not occur in the data.{ a: fullSeq('a') }
a function mapping from the items in the collection to an array of all possible values. This is typically used in combination with sequence helper functions like fullSeq.
replaceNullySpec
#
A map from key name to the value that nully values should be replaced with for that key. For example, given an objects of the shape {a: number, b: string}
, the replaceNullySpec may look like { a: -1, b: 'n/a' }
. Note you are not required to fill in values for all keysโ any unspecified keys will keep their nully value.
#
Usage#
countTallies the number distinct values for the specified keys and adds the count as a new key (default n
). Optionally sorts by the count. This is a convenience wrapper around groupBy, tally, and optionally arrange.
#
ParametersgroupKeys
#
The group keys to pass to groupBy. Either a key in the item or an accessor function that returns the grouping value, or an array combining these two options.
options
#
name = 'n'
: The name of the count value in the resulting items.sort = false
: Whether or not the resulting items should be sorted by the count key descending.
#
Usage#
debugLogs to the console the current state of the data. For grouped data, each group will be output.
The data passes through unmodified.
#
Parameterslabel?
#
A label to display along with the debugged output.
options?
#
limit = 10
: When non-null, the output is limited to the first n items.output = 'table'
: Switches between console.log or console.table as the output mechanism.
#
Usage#
distinctRemoves items with duplicate values for the specified keys. If no keys provided, uses strict equality for comparison. You may also think of this as reducing a dataset to just unique values for the specified columns.
#
Parameterskeys
#
The set of keys or accessors to use to compare whether two items in a collection are equal.
#
Usage#
expandExpands a collection of items to include all combinations of the specified keys. Non-specified keys will be dropped.
#
ParametersexpandKeys
#
The keys to expand the collection to have all combinations of. This can be specified as a single key string, an array of key strings or a key mapping object. The key mapping object maps from keys in the items to either:
{ a: 'a' }
: the key name itself. In this case, the values to use for the combinations will be derived from what is in the data currently.{ a: [1, 2, 3, 4] }
an array of values denoting all possible values for this key, even if they do not occur in the data.{ a: fullSeq('a') }
a function mapping from the items in the collection to an array of all possible values. This is typically used in combination with sequence helper functions like fullSeq.
#
Usage#
fillFills values for the specified keys to match the last seen value in the collection.
#
Parameterskeys
#
The key or keys in the items to fill in. Only the specified keys will be affected.
#
Usage#
filterFilters out items from the collection based on the filter fn, similar to Array.prototype.filter
.
#
ParametersfilterFn
#
The predicate function to filter by: items are only kept if it returns true.
#
Usage#
fullJoinPerforms a full join on two collections of items.
#
ParametersitemsToJoin
#
The collection of items to join.
options
#
An options object specifying with the following options:
by
The key (string
) or keys (string[]
) to join the two collections on. This form only works if both sets of data have the same column names. If you need to map more specifically, provide an object mapping from key in the original data set to key in the join dataset. Note that ifby
is not provided, then overlapping columns will be autodetected and used.
#
Usage#
groupByRestructures the data to be nested by the specified group keys then runs a tidy flow on each of the leaf sets. Grouped data can be exported into different shapes via group export helpers, or if not specified, will be ungrouped back to a flat list of items.
See the groupBy docs for details.
#
Usage#
innerJoinPerforms an inner join on two collections of items.
#
ParametersitemsToJoin
#
The collection of items to join.
options
#
An options object specifying with the following options:
by
The key (string
) or keys (string[]
) to join the two collections on. This form only works if both sets of data have the same column names. If you need to map more specifically, provide an object mapping from key in the original data set to key in the join dataset. Note that ifby
is not provided, then overlapping columns will be autodetected and used.
#
Usage#
leftJoinPerforms a left join on two collections of items.
#
ParametersitemsToJoin
#
The collection of items to join.
options
#
An options object specifying with the following options:
by
The key (string
) or keys (string[]
) to join the two collections on. This form only works if both sets of data have the same column names. If you need to map more specifically, provide an object mapping from key in the original data set to key in the join dataset. Note that ifby
is not provided, then overlapping columns will be autodetected and used.
#
Usage#
mapMaps items from one form to another, similar to Array.prototype.map
.
#
ParametersmapFn
#
Takes the current item and returns the new item.
#
Usage#
mutateModify items by adding new columns/keys, or changing existing ones. This operation goes item by item, if you need to mutate with values across multiple items, use mutateWithSummary.
See item helpers for utility functions that help with common mutate operations.
#
ParametersmutateSpec
#
A specification showing how to modify values on the items.
If the mutate value is a function, it will be passed the an individual item at a time. All mutations specified happen on a single item before moving to the next item. For mutations that require computing across items, use mutateWithSummary.
If the mutate value is a single value, it will be assigned directly to each item.
If the mutate value is an array of values, it will be assigned directly to each item.
Note that the order of keys matters. Later keys can reference values from previous keys, e.g.:
#
Usage#
mutateWithSummaryModify items by adding new columns/keys, or changing existing ones. This operation can look across multiple items to produce values, which allows summarizations to be added (e.g. totals). If you only need to mutate individual items, use mutate.
See vector helpers and summarizers for utility functions that help with common mutateWithSummary operations.
#
ParametersmutateSummarySpec
#
A specification showing how to modify values on the items.
For each key specified in the mutateSummarySpec
, a vector of mutated values is computed then merged (immutably) back into the items before moving to the next key. If you want to mutate on a per-item basis, use mutate instead.
If the mutate value is a function, it will be passed the set of all items in the collection to run against, which allows efficient computing of things like means or totals across the entire dataset.
If the mutate value is a single value, it will be assigned directly to each item.
If the mutate value is an array of values, it will be assigned to the items using matching indices.
Note that the order of keys matters. Later keys can reference values from previous keys, e.g.:
#
Usage#
renameRename keys in a collection.
#
ParametersrenameSpec
#
A mapping from the old key name to the new, renamed key, similar style to destructuring an object.
#
Usage#
replaceNullyReplaces nully values with what is specified in the spec on a per-key basis.
#
ParametersreplaceNullySpec
#
A map from key name to the value that nully values should be replaced with for that key. For example, given an objects of the shape {a: number, b: string}
, the replaceNullySpec may look like { a: -1, b: 'n/a' }
. Note you are not required to fill in values for all keysโ any unspecified keys will keep their nully value.
#
Usage#
select / pickSelect subparts of items. This function can be used to re-order keys or for selecting subselections of keys (similar to pick and omit from lodash).
See selectors for convenient ways specify keys to select.
#
ParametersselectKeys
#
The keys, or functions that resolve to keys to select from the object. If a key is prefixed with -
, it will be removed from the object. If the first argument passed begins with -
, an implicit everything selector will be called first.
#
Usage#
sliceSelects a subset of the data, similar to Array.prototype.slice.
#
Parametersstart
#
The starting index to select from. The item at this index is included in the results.
end
#
The ending index before which to end selecting. The item at this index is not included in the results.
#
Usage#
sliceHeadSelects the first N items in the collection.
#
Parametersn
#
The number of items to select.
#
Usage#
sliceTailSelects the last N items in the collection.
#
Parametersn
#
The number of items to select.
#
Usage#
sliceMinSelects the minimum N items in the collection ordered by some comparators, similar to arrange.
#
Parametersn
#
The number of items to select.
orderBy
#
A key or set of keys of the item to sort by, or comparator functions that return -1, 0, or 1 if a < b, a == b, a > b respectively. See arrange for details.
#
Usage#
sliceMaxSelects the maximum N items in the collection ordered by some comparators, similar to arrange.
#
Parametersn
#
The number of items to select.
orderBy
#
A key or set of keys of the item to sort by, or comparator functions that return -1, 0, or 1 if a < b, a == b, a > b respectively. See arrange for details.
#
Usage#
sliceSampleSelects the a random sample of N items in the collection.
#
Parametersn
#
The number of items to select.
options?
#
replace = false
: If true, samples items with replacement, otherwise without. If using with replacement, you can sample more than the items that are available.
#
Usage#
summarizeTakes a collection of items and reduces them to a single item, commonly used for computing averages or sums across a group or dataset.
#
ParameterssummarizeSpec
#
An object specifying how to compute the summarized values in the output. The output object matches the keys in this specification with their values set to the output of their respective functions in the spec. Typically the values make use of the provided summarizers, but can be anything.
For example:
Note that keys not specified will be dropped from the output unless the rest
option is provided.
options
#
rest
: When provided, all keys in the source objects that are not in thesummarySpec
will be resolved via the function that is provided. This is equivalent to specifying all of them in thesummarySpec
. Typically this is combined with first or last:
#
Usage#
summarizeAllA simpler form of summarize where all keys in the data are summarized via the specified function.
#
ParameterssummaryFn
#
The function to apply to each key in the source data to create the summarized output.
#
Usage#
summarizeAtA simpler form of summarize where the specified keys are summarized via the same specified function. All other keys are dropped.
#
Parameterskeys
#
The keys on which the summary function will be applied. You can either provide the keys directly as strings or you can use selectors, or a combination of the two.
summaryFn
#
The function to apply to each key in the source data to create the summarized output.
#
Usage#
summarizeIfA simpler form of summarize where the summary function is called on keys whose values pass the specified predicate.
#
ParameterspredicateFn
#
A function that given a vector of values for a key (e.g. items.map(item => item.value)
), returns true if that key should be summarized.
summaryFn
#
The function to apply to each key in the source data to create the summarized output.
#
Usage#
tallyTally is a wrapper that summarizes the data with n: counts the number of items (per group if grouped).
#
Parametersoptions
#
name = 'n'
: The name of the count value in the resulting items.
#
Usage#
totalConvenience wrapper around summarize and mutate that appends a new summarized row to the end of the data. Typically used for computing totals.
#
ParameterssummarizeSpec
#
The same as summarize::summarizeSpec โ an object specifying how to compute the summarized values in the output. In this case, the expectation is the keys match the input data.
mutateSpec
#
A specification showing how to modify values on the items. See mutate for details. Can be useful for setting a field that indicates this is a total row (e.g. { id: '__total' }
).
#
Usage#
totalAllA simpler form of total, but uses summarizeAll instead of summarize.
#
ParameterssummaryFn
#
The function to apply to each key in the source data to create the summarized output.
mutateSpec
#
A specification showing how to modify values on the items. See mutate for details. Can be useful for setting a field that indicates this is a total row (e.g. { id: '__total' }
).
#
Usage#
totalAtA simpler form of total, but uses summarizeAt instead of summarize.
#
Parameterskeys
#
The keys on which the summary function will be applied. You can either provide the keys directly as strings or you can use selectors, or a combination of the two.
summaryFn
#
The function to apply to each key in the source data to create the summarized output.
mutateSpec
#
A specification showing how to modify values on the items. See mutate for details. Can be useful for setting a field that indicates this is a total row (e.g. { id: '__total' }
).
#
Usage#
totalIfA simpler form of total, but uses summarizeIf instead of summarize.
#
ParameterspredicateFn
#
A function that given a vector of values for a key (e.g. items.map(item => item.value)
), returns true if that key should be summarized.
summaryFn
#
The function to apply to each key in the source data to create the summarized output.
mutateSpec
#
A specification showing how to modify values on the items. See mutate for details. Can be useful for setting a field that indicates this is a total row (e.g. { id: '__total' }
).
#
Usage#
transmuteThe same as mutate, except all keys are dropped except those specified to be mutated.
#
ParametersmutateSpec
#
A specification showing how to modify values on the items. See mutate for details.
#
Usage#
whenConditionally runs a tidy subflow based on the result of a boolean or predicate function.
#
Parameterspredicate
#
When true, or the function results in true, the subflow is run, otherwise the input items are passed through unmodified.
fns
#
Array of tidy functions to run on the input data when the predicate is true.