Check for common data quality issues affecting modeling functions
Source:R/check_model_data.R
check_model_data.Rd
Check the primary dataset for NAs, NaNs, Inf, and that each row is a unique choice occurrence
Arguments
- dat
Primary data containing information on hauls or trips. Table in FishSET database contains the string 'MainDataTable'.
- project
Project name.
- uniqueID
Variable in
dat
containing unique occurrence identifier.- latlon
Vector of names for variables with lat, lon coordinates to be check if using 'lat-lon' as starting location.
- save.file
Logical, if TRUE and no data issues are identified, the dataset is saved to the FishSET database. Defaults to
TRUE
.
Details
It is best to check the data for NAs, NaNs and Inf, and that each row
is a unique choice occurrence after data creation functions have been run but
before making the model design file (make_model_design
). These steps
should be taken even if the data passed earlier data verification checks, as
data quality issues can arise in the creation or modification of data. Model
functions may fail or return inaccurate results if data quality issues exist.
The integrated data will not save if any of these issues are in the dataset.
If data passes all tests, then data will be saved in the FishSET database with
the prefix ‘final’. The data index table will also be updated and saved.