Create your own likelihood function
Paul Carvalho
2024-11-13
Source:vignettes/likelihood-template.Rmd
likelihood-template.Rmd
Introduction
The FishSET R package currently features six statistical functions for maximum likelihood estimation, detailed in section 8.3 of the user manual. These functions offer a solid foundation for a variety of analyses. However, we recognize that these built-in functions may not suit your specific modeling needs. We encourage users with more specialized modeling requirements to develop their own likelihood functions that can be integrated into the FishSET R package. By doing so, users will not only be able to take advantage of other FishSET features such as policy simulation tools but also contribute to a richer toolkit for the FishSET user community.
Here we provide a template for developing a likelihood function that can be seamlessly integrated into the FishSET package. Feel free to reach out to our team at nmfs.fishset@noaa.gov with any questions about developing your own likelihood function.
Development guidelines
Fork the FishSET repo and clone the forked repository to your computer. More information on “forking” repositories and the GitHub workflow can be found here.
Open the FishSET_RPackage.Rproj file.
Create a new script for your likelihood function following the template below. Save it in the
R
folder.Commit and push changes.
Review and test code.
If you have gone through the steps above but it has been a while since you worked on the code, pull changes from the original FishSET repo to your clone.
Submit a pull request.
Create a likelihood function
Give your likelihood function an informative name (replace
likelihood_function
in the code chunk above) and save the
script with the same name. Only include code for the likelihood function
in the R script.
The names and order of input arguments must match the code above.
These inputs are created within the make_model_design()
and discretefish_subroutine()
functions and described in detail below.
Input arguments
starts3: Numeric vector that contains starting parameter values. The length and order of parameter values will depend on your model structure. For example, the order of
starts3
for the conditional logit model isc([alternative-specific parameters], [travel-distance parameters])
, and the length of each depends on the number of variables included in the model design.dat: Numeric matrix that is generated in the
shift_sort_x()
function. The first column contains catch (thus rows represent observations); the second column indicates the zone fished; columns 3 to the square of alternatives + 2 (example, if there are 4 alternatives/zones, then columns 3-18) contains a flattened identity matrix that has been shifted and sorted such that the zone selected is moved to the first column position of the matrix (see example below); and the last x columns, where x equals the number of alternatives/zones, contains distances from the starting location to each alternative (distances have also been shifted such that the distance to the zone selected is first - see example below).
dat
matrix:.\[\begin{bmatrix} 17&2&0&0&1&1&0&0&0&1&0&5&10&15 \\ \end{bmatrix}\]
\[\begin{bmatrix} 0&0&1 \\ 1&0&0 \\ 0&1&0 \\ \end{bmatrix}\]
position (because zone 2 was fished), followed by 3 and 1. Finally, the distances from the starting location
to zones 2, 3, and 1 are 5, 10, and 15, repectively.
otherdat: A list that contains variables used in the model. For example,
otherdat
for conditional logit models containsintdat
variables (as a list object) that interact with travel distance andgriddat
variables (as a list object) that vary across alternatives. Theotherdat
list is generated in themake_model_design()
function. If your input variables are not compatible with the current version ofmake_model_design()
, reach out to our team at nmfs.fishset@noaa.gov and we will do our best to update the function to accommodate your modeling requirements.alts: Integer representing the total number of zones included in the model.
project: Name of project.
expname: Name of expected catch table(s). This input is used for conditional logit models.
mod.name: Name of the model, which is designated in the
make_model_design()
function.
Function body
Use base R functions to calculate the negative log-likelihood (nll) and comprehensively document code.
Save the nll value to a variable named ld
, and insert
the following code at then end of your function to log inputs and
outputs of the function call:
if (is.nan(ld) == TRUE) {
ld <- .Machine$double.xmax
}
ldsumglobalcheck <- ld
paramsglobalcheck <- starts3
LDGlobalCheck <- unlist(as.matrix(ldchoice))
LDGlobalCheck <- list(model = paste0(project, expname, mod.name),
ldsumglobalcheck = ldsumglobalcheck,
paramsglobalcheck = paramsglobalcheck,
LDGlobalCheck = LDGlobalCheck)
pos <- 1
envir = as.environment(pos)
assign("LDGlobalCheck", value = LDGlobalCheck, envir = envir)
return(ld)
Integrating your function with FishSET
Review code and test function
Before submitting a Pull Request, we ask contributors to thoroughly review code and test likelihood functions (ideally on multiple datasets if possible). Also, verify that the function is well-documented and remove unnecessary or redundant code that might have been left behind during development. Please reach out to the FishSET team (nmfs.fishset@noaa.gov) if any questions come up during the review and testing process.
Create a pull request
Give the Pull Request the same name as the likelihood function you just created and provide a brief description. Once the FishSET team receives the Pull Request we will do our best to review the function in a timely manner and notify you if we merge it with the main package or request any changes to the function prior to merging.
The Pull Request for your likelihood function should not change any other code in the FishSET package. If changes to the package outside of your function are necessary, please contact the FishSET development team (nmfs.fishset@noaa.gov) to resolve any issues.