Github Resources
CB Genetics Working Best Practices
Best practices for working on analyses in github:
All repositories with NOAA work should be housed in Github Enterprise under the NWFSC organization, NOT in your personal/NOAA github account. This ensures that code will be retained and remain accessible even if the author leaves the agency.
Work and commit in the main branch of a repo (see special cases for branching and forking below)
Commits
Perform commits daily at minimum when working on analysis
Make sure to push to origin (if working from github desktop) daily as well!
If working in a branch, commit daily to the branch, and determine how often to merge to main
Forking
Generally do not use!
Forking may be useful if contributors to the repository are outside of enterprise, but you must create and uphold good team standards on how often forks are merged back into the main repository.
Branching
Use sparingly!
Create a branch for a specific task, complete task, merge to main immediately.
Use when you are making a change that could “break” the main repo for some reason
Branches are easy to “get lost in” and can create confusion and complicate organization of your repo
Projects can be created within our CB-genetics team. Use them to organize tasks or discussions associated with one or multiple repositories.
CB Genetics Github Repository Guidelines
See How to: Create a github repository in enterprise for a full tutorial walk through of creating a github repository.
From the NMFS Github Guide, your repository should include a README, the government product DISCLAIMER, and LICENSE file. In addition, all CB Genetics github repositories should include the following components:
README
The README file should provide a detailed description of the repository. The contents of the README will vary depending on the project type and degree of completeness, but should include a few common elements.
Recommended README components:
Primary contacts: Include name and email of at least one primary contact
Citation/Reference/DOI: If the work is something that can be cited, then also include citation information and a DOI. This will be more common in repositories containing finished projects.
Objective: Briefly describe the objective of the project/analysis to provide context for your repository.
Methods: Include relevant information regarding the methods used in the analysis for this project repository. This can be broken up in sections if needed for a larger repository that houses multiple analyses.
Major Results: For projects in mostly finished states, add major takeaways from your analysis. Including figures is optional.
Disclaimer: Repositories and web content shared on GitHub should make it clear to the audience that no information should be considered or interpreted as official communication of NOAA. The simplest method for doing this is to include the disclaimer text as a footnote within the repository’s README.md.
LICENSE File
You should add an open license file to the GitHub repository before work begins. This establishes from the moment work starts that all work is open source (as required for publicly released federally funded work).
To add a license to your repository, navigate to the base level of your repository (so not in a subdirectory) and add a file called LICENSE. Under the file name, click Choose a license template, select Apache-2.0 license.
Gitignore
.gitignore files are used to tell Github to ignore some files in your working directory when pushing to Github remote. This is useful because Github cannot handle large files (such as fastq) and sometimes we don’t want every file in our working directory to be pushed up to Github.
A .gitignore file should be created when you first set up your repository - often we choose a template like the R template because we are using R code in our repository and the .gitignore file will ignore syncing the temporary files that are created when working in R.
If you do not see a .gitignore file in your repository directory:
- Check to see that it is not just a hidden file- On a mac, press Command + Shift + . to show hidden files
- If no .gitignore file exists, create one from a simple text file and save it as “.gitignore” in your main repository directory. You can use a template for a certain language from the list here or can create a useful .gitignore for your project with one or more coding languages here.
To edit your .gitignore for specific directories or file types, use the examples below:
To ignore a whole directory
**/directory_name
To ignore the files within a directory (the directory will show up in github remote but will be empty- if for some reason you want to preserve directory structure)
directory_name/**
To ignore a specific file extension, from anywhere in your repository directory (this will also ignore these file extensions when they exist in sub-diretories)
*.fastq
Assign CB-Genetics Team
Assigning the CB-Genetics team to your repository keeps our group’s work organized and easy to find within the NWFSC enterprise organization. Simply navigate to the NWFSC enterprise organization, click on the Teams tab, expand the CB team to find the CB-Genetics team within it, then click on Repositories.
To do this: Settings -> Collaborators and teams -> Manage access -> Add teams -> CB-Genetics (You choose permissions when establishing this access.)
For a guided tutorial, see How to: Create a github repository in enterprise
Add the nwfsc-cb-genetics Topic to Your Repository
Topics are used in Github like hashtags are used in social media. It is another way to categorize repositories and organize them. For example, the NWFSC Github Organization landing page has a link to all repositories with the nwfsc-cb-genetics topic.
The “nwfsc-cb-genetics” topic should be added to all repositories in our group.
To do this: navigate to the main page of the repository -> in the top right corner, to the right of “About”, clikc the gear symbol -> under “Topics” start to type nwfsc-cb-genetics, click the matching topic from the dropdown menu -> click “Save changes”
For a guided tutorial, see How to: Create a github repository in enterprise
NOAA Github Repository Guidelines
From the NMFS Github Guide, all repositories, regardless of purpose, must follow these general guidelines:
- PII and BII should never be shared (on purpose or inadvertently) on GitHub regardless of whether the repository is in a private or public repository.
- No sensitive information should be shared in repositories, including (but is not limited to) usernames, passwords, login information, port numbers, IP addresses, server names, Application Programming Interface (API) keys, Personally Identifiable Information (PII), Business Identifiable Information (BII), or confidential data.
- GitHub is not a back-up service nor is it a data repository with archiving. Other tools are designed for this purpose.
- Only scientific content that can be reasonably classified as FISMA Low should be shared on GitHub.
- Repositories that have code that interacts with APIs using IP addresses, usernames, passwords, secrets, or credentials must take steps to prevent committing of “secrets” to GitHub.