Skip to contents

Create a list containing likelihood function, parameters, and data to be pass to model call function

Usage

make_model_design(
  project,
  catchID,
  likelihood = NULL,
  initparams = NULL,
  optimOpt = c(100, 1e-08, 1, 1),
  methodname = "BFGS",
  mod.name = NULL,
  vars1 = NULL,
  vars2 = NULL,
  priceCol = NULL,
  expectcatchmodels = list("all"),
  startloc = NULL,
  polyn = NULL,
  crs = NULL,
  outsample = FALSE,
  CV_dat = NULL
)

Arguments

project

String, name of project.

catchID

String, variable from dat that contains catch data.

likelihood

String, name of likelihood function. A description of explanatory variables for each likelihood is provided below in the details sections. Information on likelihood- specific initial parameter specification can be found in discretefish_subroutine() documentation.

logit_c:Conditional logit likelihoodlogit_zonal:Zonal logit with area-specific constants procedure
logit_correction:Full information model with Dahl's correction functionepm_normal:Expected profit model with normal catch function
epm_weibull:Expected profit model with Weibull catch functionepm_lognormal:Expected profit model with lognormal catch function
initparams

Vector or list, initial parameter estimates for revenue/location-specific covariates then cost/distance. The number of parameter estimate varies by likelihood function. See Details section for more information. The initial parameters will be set to 1 if initparams == NULL. If initparams is a single numeric value, it will be used for each parameter. If using parameter estimates from previous model, initparams should be the name of the model the parameter estimates should come from. Examples: initparams = 'epm_mod1', initparams = list('epm_mod1', 'epm_mod2').

optimOpt

String, optimization options (max function evaluations, max iterations, (reltol) tolerance of x, trace) Note: add optim reference here?.

methodname

String, optimization method (see stats::optim() options). Defaults to "BFGS".

mod.name

String, name of model run for model result output table.

vars1

Character string, additional ‘travel-distance’ variables to include in the model. These depend on the likelihood. See the Details section for how to specify for each likelihood function.

vars2

Character string, additional variables to include in the model. These depend on the likelihood. See the Details section for how to specify for each likelihood function. For likelihood = 'logit_c', vars2 should be the name of the gridded table saved to the FishSET Database, and should contain the string "GridTableWide". See format_grid() for details.

priceCol

Variable in dat containing price information. Required if specifying an expected profit model for the likelihood (epm_normal, epm_weibull, epm_lognormal).

expectcatchmodels

List, name of expected catch models to include in model run. Defaults to all models. Each list item should be a string of expected catch models to include in a model. For example, list(c('recent', 'older'), c('user1')) would run one model with the medium and long expected catch matrices, and one model with just the user-defined expected catch matrix. Choices are "recent", "older", "oldest", "logbook", "all", and "individual". See create_expectations() for details on the different models. Option "all" will run all expected catch matrices jointly. Option "individual" will run the model for each expected catch matrix separately. The final option is to select one more expected catch matrices to run jointly.

startloc

Variable in dat identifying the location when choice of where to fish next was made. Required for logit_correction likelihood. Use the create_startingloc() function to create the starting location vector.

polyn

Numeric, correction polynomial degree. Required for logit_correction() likelihood.

crs

coordinate reference system to be assigned when creating the distance matrix. Passed on to create_dist_matrix().

outsample

Logical, indicates whether the model design is for main data (FALSE) or out-of-sample data (TRUE). The default is outsample = FALSE.

CV_dat

Dataframe that contains training or testing data for k-fold cross validation. Defaults to CV_dat = NULL.

Value

Function creates the model matrix list that contains the data and modeling choices. The model design list is saved to the FishSET database and called by the discretefish_subroutine(). Alternative fishing options come from the Alternative Choice list, generated from the create_alternative_choice() function, and the expected catch matrices from the create_expectations() function. The distance from the starting point to alternative choices is calculated.

Model design list:

likelihood:Name of likelihood functioncatch:Data corresponding to actual zonal catch
catchID:Character for the name of the variable with catch datachoice:Data corresponding to actual zonal choice
initparms:Initial parameter valuesoptimOpt:Optimization options
methodname:Optimization methodmod.name:Model name for referencing
vars1:Character vector for variables with 'travel-distance' variablesvars2:Character vector for additional variables
priceCol:Variable in dat with price informationmod.date:Date the model was designed
startingloc:starting locationsscales:Scale vectors to put catch data, zonal data, and other data on same scale
distance:Data corresponding to distanceinstances:Number of observations
alts:Number of alternative zonesepmDefaultPrice:Price data
dataZoneTrue:Vector of 0/1 indicating whether the data from that zone is to be included based on the minimum number of hauls.typeOfNecessary:Whether data is at haul or trip level
altChoiceType:Function choice. Set to distancealtChoiceUnits:Units of distance
occasion:The choice occasionoccasion_var:Character for variable with choice occasion
alt_choice:Alternative choice matrixbCHeader:Variables to include in the model that do not vary by zone. Includes independent variables and interactions
gridVaryingVariables:Variables to include in the model that do vary by zone such as expected catch (from create_expectations() function)startloc:Variable in dat identifying location when choice of where to fish next was made
polyn:Numeric, correction polynomial degreespat:A spatial data file
spatID:Variable in spat that identifies areas or zonescrs:coordinate reference system
gridVaryingVariables:Area-specific variablesexpectcatchmodels:List of expected catch matrices

Details

Function creates the model matrix list that contains the data and modeling choices. The model design list is saved to the FishSET database and called by the discretefish_subroutine(). Alternative fishing options come from the Alternative Choice list, generated from the create_alternative_choice() function, and the expected catch matrices from the create_expectations() function. The distance from the starting point to alternative choices is calculated.

Variable names details:

vars1vars2logit_c:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the cost
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    a single parameter.
"alternative-specific variables"
    vary across alternatives, e.g. catch rates.
    Each variable name therefore corresponds to data
    with dimensions (number of observations) by
    (number of alternatives), and returns a single
    parameter for each variable (e.g. the marginal
    utility from catch).
logit_zonal:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the cost
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    a single parameter.
"average-catch variables" are
    alternative-invariant variables, e.g. vessel
    gross tonnage. Each variable name therefore
    corresponds to data with dimensions (number of
    observations) by (unity), and returns (k-1)
    parameters where (k) equals the number of
    alternatives, as a normalization of parameters
    is needed as the probabilities sum to one.
    Interpretation is therefore relative to the
    first alternative.
epm_normal:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the
    cost portion of the likelihood. Each variable
    name therefore corresponds to
    data with dimensions (number of observations)
    by (unity), and returns a single parameter.
"catch-function variables" are
    alternative-invariant variables that are
    interacted with zonal constants to form the
    catch portion of the likelihood. Each variable
    name therefore corresponds to data with
    dimensions (number of observations) by (unity),
    and returns (k) parameters where (k) equals
    the number of alternatives.
epm_lognormal:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the
    cost portion of the likelihood. Each variable
    name therefore corresponds to data with
    dimensions (number of observations) by (unity),
    and returns a single parameter.
"catch-function variables" are
    alternative-invariant variables that are
    interacted with zonal constants to form the
    catch portion of the likelihood. Each variable
    name therefore corresponds to data with
    dimensions (number of observations) by (unity),
    and returns (k) parameters where (k) equals
    the number of alternatives.
epm_weibull:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the cost
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    a single parameter.
"catch-function variables" are
    alternative-invariant variables that are
    interacted with zonal constants to form the catch
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    (k) parameters where (k) equals the number of
    alternatives.
logit_correction:
"travel-distance variables" are
    alternative-invariant variables that are
    interacted with travel distance to form the cost
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    a single parameter.
"catch-function variables" are
    alternative-invariant variables that are
    interacted with zonal constants to form the catch
    portion of the likelihood. Each variable name
    therefore corresponds to data with dimensions
    (number of observations) by (unity), and returns
    (k) parameters where (k) equals the number of
    alternatives.

Examples

if (FALSE) {
make_model_design("pollock", catchID= "OFFICIAL_TOTAL_CATCH",  
  likelihood='logit_zonal', 
  vars1=NULL, vars2=NULL, initparams=c(-0.5,0.5),
  optimOpt=c(100000, 1.0e-08, 1, 1), methodname = "BFGS", mod.name = "logit4"
)
}