tfcb_2021

Software installation

We ask you come prepared to class with a laptop on which you can participate in coding activities. Please follow the instructions below to install the required software for this course (all programs should come pre-installed on your computer, or are freely available for academic use). We’ll use the other files in this directory to test the software together in class. The tools you’ll need include:

Git

You can follow the instructions here to install Git.

Unix command line

For Windows users

Windows 10 comes with a new feature called Windows Subsystem for Linux (WSL) that allows you to access Unix tools on your computer. Please install WSL with the instructions here.

For MacOSX users

Macintosh operating systems are built on Unix, so many of the tools you’ll need are pre-installed on your computer. You can access the command line through an application called Terminal. You can either search for this in Finder, or use the Go drop-down menu to locate it in the Utilities folder.

Python

Please install Python using Anaconda, which includes Jupyter notebooks and most of the other packages we’ll use for the course, according to the following instructions:

Conda Environment

A conda environment is a directory that stores a specific collection of packages that you have installed. For this course, we ask that you create a custom conda environment with certain R packages so that everyone will be on the same page. To do this:

Text Editor

Biological data is almost exclusively represented as text, and we will be writing code and documentation in text files. It’s useful to be able to open, examine, and edit text files using a light-weight text editor. Microsoft Word is not suitable for these types of files, and we do not recommend the text editors set as default on your computer, as these programs are often not optimized for working with code.

We will use VSCode as the source code and text editor for this class. Install VSCode on your computer and view the introductory video here before the first class to familiarize yourself with the user interface.

We will use the following features in VSCode as part of this class:

  1. Work with Git and GitHub for version control (see more below).
  2. Edit Markdown files and preview rendered versions.
  3. Write Python code in native Jupyter notebooks.
  4. Use the integrated terminal.
  5. Connect to and work on remote hosts such as the Fred Hutch Rhino computing cluster.

Getting started with using Terminal/Python/R in VSCode

To get started on using VSCode:

  1. Open VSCode and click on ‘Extensions’ on the left-side menu.
  2. Install Python (by Microsoft), Markdown Preview Enhanced (by Yiyi Wang), and Remote - SSH (by Microsoft).
  3. Clone the tfcb2021 GitHub repository in VSCode with View > Command Palette > Git: Clone > https://github.com/FredHutch/tfcb_2021.git.
  4. Click on ‘Explorer’ on the left-side menu, and you should be able to open and manipulate all the files in the TFCB 2021 GitHub repository.

To use Python in Jupyter notebook:

  1. Open the test Jupyter notebook test_python.ipynb.
  2. In the upper right corner, you should see an icon to select a kernel. Select the anaconda environment that you created, which should look something like Python 3.x.x 64-bit (conda).
  3. On the left-side of the code block, click the triangle to execute the code.
  4. You should see a 2 x 2 table filled with 1’s (and no error messages).

To use R in Jupyter notebook:

  1. Open the Terminal window in VSCode (Terminal > New Terminal) and activate the tfcb2021 environment you previously created with conda activate tfcb2021.
  2. Type which R into Terminal to identify where the R interpreter is located in the environment (ex. /usr/local/bin/R).
  3. Now switch back to the base environment with conda activate base.
  4. Type the location identified in Step 2 (ex. /usr/local/bin/R). This should start the R interpreter.
  5. In the R interpreter, type IRkernel::installspec() after the >. This should install kernelspec in the right location.
  6. Open the test Jupyter notebook test_R.ipynb.
  7. In the upper right corner, select R as the kernel this time.
  8. Run the code block.
  9. You should see a scatter plot of hwy vs. displ.

Passwordless authentication for Rhino

It can get annoying having to type your password everytime you ssh into rhino. With an SSH key, you won’t have to enter your password anymore. To set this up:

  1. Open a Terminal window on your local computer and type ssh-keygen.
  2. When prompted with “Enter file in which to save the key (/Users/USERNAME/.ssh/id_rsa):”, simply press Enter to save the key in the default location.
  3. Follow the prompt and enter a passphrase, which should be a longer complex password to ensure best protection of your key.
  4. Now, you should see that your public key has been saved. To copy your key to snail, type the command: ssh-copy-id HUTCHID@snail.fhcrc.org. To copy your key to rhino (via snail), type the command: ssh-copy-id -o ProxyJump=HUTCHID@snail.fhcrc.org HUTCHID@rhino.fhcrc.org. It should prompt you for your password.
  5. Lastly, modify your ~/.ssh/config. To do this:
    • Check that you have the Remote - SSH extension installed. If not, follow instructions here to install this extension.
    • In VSCode, click View > Command Palette > Remote-SSH: Open SSH Configuration File…
    • You should see a config file that looks like this:
      # Read more about SSH config files: https://linux.die.net/man/5/ssh_config
      Host alias
       HostName hostname
       User user
      
    • Delete the text and copy/paste the text below into its place. Modify the “HUTCHID” parts to your Hutch username, and save.
Host snail
    Hostname snail.fhcrc.org
    User HUTCHID

Host rhino
    UseKeychain  yes
    AddKeysToAgent yes
    IdentityFile ~/.ssh/id_rsa
    User HUTCHID
    HostName rhino.fhcrc.org
    ProxyCommand ssh HUTCHID@snail.fhcrc.org exec nc %h %p 2> /dev/null

  1. Congratulations! Now, you should be able to ssh into rhino without typing your password each time using ssh rhino. <!—

    Software installation

We ask you come prepared to class with a laptop on which you can participate in coding activities. Please follow the instructions below to install the required software for this course (all programs should come pre-installed on your computer, or are freely available for academic use). We’ll use the other files in this directory to test the software together in class. The tools you’ll need include:

Python

We will use Jupyter notebooks to record code, output, and text throughout the course. We recommend installing Python using Anaconda, which includes Jupyter notebooks and most of the other packages we’ll use for the course, according to the following instructions:

Text Editor

Biological data is almost exclusively represented as text, and we will be writing code and documentation in text files. It’s useful to be able to open, examine, and edit text files using a light-weight text editor. Microsoft Word is not suitable for these types of files, and we do not recommend the text editors set as default on your computer, as these programs are often not optimized for working with code.

We will use VSCode as the source code and text editor for this class. Install VSCode on your computer and view the introductory video here before the first class to familiarize yourself with the user interface.

We will use the following features in VSCode as part of this class:

  1. Work with Git and GitHub for version control (see more below).
  2. Edit Markdown files and preview rendered versions.
  3. Write Python code in native Jupyter notebooks.
  4. Use the integrated terminal.
  5. Connect to and work on remote hosts such as the Fred Hutch Rhino computing cluster.

Spreadsheet program

Spreadsheet programs are a useful way for us as humans to interact with data. The most common of these is Microsoft Excel. Commands may differ a bit between programs, but the general ideas for thinking about spreadsheets are the same. If you are working on a computer owned by Fred Hutch, Microsoft Office (including Excel) is available through the Self Service application. If you are working on a personal computer that doesn’t have a spreadsheet program, you can use a free, open source program called LibreOffice.

Install LibreOffice by going to the installation page. The version for your operating system should automatically be selected. Click Download Version X.X.X (whichever is the most recent version). You will go to a page that asks about a donation, but you don’t need to make one. Your download should begin automatically. Once the installer is downloaded, double click on it and LibreOffice should install.

Git

Git is version control software, which helps you keep track of changes made to files. GitHub is a repository for data and code tracked with Git, and is a mechanism for publishing and collaborating on project development. VSCode and GitHub play nicely together and you will be able to do lot of Git-related activities from within VSCode. Installing VScode should also install Git on your computer.

GitHub Account

If you do not already have one, please register for a GitHub account. Please note that your name and email will be publicly visible through GitHub by default, but more information on controlling privacy settings can be found here.

R

R and RStudio are separate downloads. R is the “engine”, while RStudio is an integrated desktop environment (IDE) that makes using R much more pleasant. R must be installed before RStudio. Follow the instructions below for your operating system to install them. If you are working on a computer owned by Fred Hutch, RStudio + R is available through the Self Service application.

Windows

MacOSX

Installing tidyverse

Unix command line (shell)

Windows

Windows 10 comes with a new feature called Windows Subsystem for Linux (WSL) that allows you to access Unix tools on your computer. Installation instructions can be found here.

Another option (such as if you are not running Windows 10) is Git for Windows, which also installs Git command-line tools. You can download here and install with default options.

MacOSX

Macintosh operating systems are built on Unix, so many of the tools you’ll need are pre-installed on your computer. You can access the command line through an application called Terminal. You can either search for this in Finder, or use the Go drop-down menu to locate it in the Utilities folder.

Logging on to rhino

We’ll be using a computer cluster at Fred Hutch called rhino for the unix classes. Please see these instructions for logging on to rhino, and note there is an extra step to log in off campus. –>