Plotting and Programming in Python

Running and Quitting

Overview

Teaching: 20 min
Exercises: 0 min
Questions
  • How can I run Python programs?

Objectives
  • Launch the JupyterLab server.

  • Create a new Python script.

  • Create a Jupyter notebook.

  • Shutdown the JupyterLab server.

  • Understand the difference between a Python script and a Jupyter notebook.

  • Create Markdown cells in a notebook.

  • Create and run Python cells in a notebook.

Many software developers will often use an integrated development environment (IDE) or a text editor to create and edit their Python programs which can be executed through the IDE or command line directly. While this is a common approach, we are going to use the Jupyter Notebook via JupyterLab for the remainder of this workshop.

This has several advantages:

Each notebook contains one or more cells that contain code, text, or images.

Getting Started with JupyterLab

JupyterLab is an application with a web-based user interface from Project Jupyter that enables one to work with documents and activities such as Jupyter notebooks, text editors, terminals, and even custom components in a flexible, integrated, and extensible manner. JupyterLab requires a reasonably up-to-date browser (ideally a current version of Chrome, Safari, or Firefox); Internet Explorer versions 9 and below are not supported.

JupyterLab is included as part of the Anaconda Python distribution. If you have not already installed the Anaconda Python distribution, see the setup instructions for installation instructions.

Even though JupyterLab is a web-based application, JupyterLab runs locally on your machine and does not require an internet connection.

JupyterLab? What about Jupyter notebooks?

JupyterLab is the next stage in the evolution of the Jupyter Notebook. If you have prior experience working with Jupyter notebooks, then you will have a good idea of what to expect from JupyterLab.

Experienced users of Jupyter notebooks interested in a more detailed discussion of the similarities and differences between the JupyterLab and Jupyter notebook user interfaces can find more information in the JupyterLab user interface documentation.

Starting JupyterLab

You can start the JupyterLab server through the command line or through an application called Anaconda Navigator. Anaconda Navigator is included as part of the Anaconda Python distribution.

macOS - Command Line

To start the JupyterLab server you will need to access the command line through the Terminal. There are two ways to open Terminal on Mac.

  1. In your Applications folder, open Utilities and double-click on Terminal
  2. Press Command + spacebar to launch Spotlight. Type Terminal and then double-click the search result or hit Enter

After you have launched Terminal, type the command to launch the JupyterLab server.

$ jupyter lab

Windows Users - Command Line

To start the JupyterLab server you will need to access the Anaconda Prompt.

Press Windows Logo Key and search for Anaconda Prompt, click the result or press enter.

After you have launched the Anaconda Prompt, type the command:

$ jupyter lab

Anaconda Navigator

To start a JupyterLab server from Anaconda Navigator you must first start Anaconda Navigator (click for detailed instructions on macOS, Windows, and Linux). You can search for Anaconda Navigator via Spotlight on macOS (Command + spacebar), the Windows search function (Windows Logo Key) or opening a terminal shell and executing the anaconda-navigator executable from the command line.

After you have launched Anaconda Navigator, click the Launch button under JupyterLab. You may need to scroll down to find it.

Here is a screenshot of an Anaconda Navigator page similar to the one that should open on either macOS or Windows.

Anaconda Navigator landing page

And here is a screenshot of a JupyterLab landing page that should be similar to the one that opens in your default web browser after starting the JupyterLab server on either macOS or Windows.

JupyterLab landing page

The JupyterLab Interface

JupyterLab has many features found in traditional integrated development environments (IDEs) but is focused on providing flexible building blocks for interactive, exploratory computing.

The JupyterLab Interface consists of the Menu Bar, a collapsable Left Side Bar, and the Main Work Area which contains tabs of documents and activities.

The Menu Bar at the top of JupyterLab has the top-level menus that expose various actions available in JupyterLab along with their keyboard shortcuts (where applicable). The following menus are included by default.

Kernels

The JupyterLab docs define kernels as “separate processes started by the server that run your code in different programming languages and environments.” When we open a Jupyter Notebook, that starts a kernel - a process - that is going to run the code. In this lesson, we’ll be using the Jupyter ipython kernel which lets us run Python 3 code interactively.

Using other Jupyter kernels for other programming languages would let us write and execute code in other programming languages in the same JupyterLab interface, like R, Java, Julia, Ruby, JavaScript, Fortran, etc.

A screenshot of the default Menu Bar is provided below.

JupyterLab Menu Bar

The left sidebar contains a number of commonly used tabs, such as a file browser (showing the contents of the directory where the JupyterLab server was launched), a list of running kernels and terminals, the command palette, and a list of open tabs in the main work area. A screenshot of the default Left Side Bar is provided below.

JupyterLab Left Side Bar

The left sidebar can be collapsed or expanded by selecting “Show Left Sidebar” in the View menu or by clicking on the active sidebar tab.

Main Work Area

The main work area in JupyterLab enables you to arrange documents (notebooks, text files, etc.) and other activities (terminals, code consoles, etc.) into panels of tabs that can be resized or subdivided. A screenshot of the default Main Work Area is provided below.

JupyterLab Main Work Area

Drag a tab to the center of a tab panel to move the tab to the panel. Subdivide a tab panel by dragging a tab to the left, right, top, or bottom of the panel. The work area has a single current activity. The tab for the current activity is marked with a colored top border (blue by default).

Creating a Python script

Creating a Jupyter Notebook

To open a new notebook click the Python 3 icon under the Notebook header in the Launcher tab in the main work area. You can also create a new notebook by selecting New -> Notebook from the File menu in the Menu Bar.

Additional notes on Jupyter notebooks.

Below is a screenshot of a Jupyter notebook running inside JupyterLab. If you are interested in more details, then see the official notebook documentation.

Example Jupyter Notebook

How It’s Stored

  • The notebook file is stored in a format called JSON.
  • Just like a webpage, what’s saved looks different from what you see in your browser.
  • But this format allows Jupyter to mix source code, text, and images, all in one file.

Arranging Documents into Panels of Tabs

In the JupyterLab Main Work Area you can arrange documents into panels of tabs. Here is an example from the official documentation.

Multi-panel JupyterLab

First, create a text file, Python console, and terminal window and arrange them into three panels in the main work area. Next, create a notebook, terminal window, and text file and arrange them into three panels in the main work area. Finally, create your own combination of panels and tabs. What combination of panels and tabs do you think will be most useful for your workflow?

Solution

After creating the necessary tabs, you can drag one of the tabs to the center of a panel to move the tab to the panel; next you can subdivide a tab panel by dragging a tab to the left, right, top, or bottom of the panel.

Code vs. Text

Jupyter mixes code and text in different types of blocks, called cells. We often use the term “code” to mean “the source code of software written in a language such as Python”. A “code cell” in a Notebook is a cell that contains software; a “text cell” is one that contains ordinary prose written for human beings.

The Notebook has Command and Edit modes.

Command Vs. Edit

In the Jupyter notebook page are you currently in Command or Edit mode?
Switch between the modes. Use the shortcuts to generate a new cell. Use the shortcuts to delete a cell. Use the shortcuts to undo the last cell operation you performed.

Solution

Command mode has a grey border and Edit mode has a blue border. Use Esc and Return to switch between modes. You need to be in Command mode (Press Esc if your cell is blue). Type b or a. You need to be in Command mode (Press Esc if your cell is blue). Type x. You need to be in Command mode (Press Esc if your cell is blue). Type z.

Use the keyboard and mouse to select and edit cells.

The Notebook will turn Markdown into pretty-printed documentation.

Markdown does most of what HTML does.

*   Use asterisks
*   to create
*   bullet lists.
  • Use asterisks
  • to create
  • bullet lists.
1.  Use numbers
1.  to create
1.  numbered lists.
  1. Use numbers
  2. to create
  3. numbered lists.
*  You can use indents
	*  To create sublists 
	*  of the same type
*  Or sublists
	1. Of different
	1. types
  • You can use indents
    • To create sublists
    • of the same type
  • Or sublists
    1. Of different
    2. types
# A Level-1 Heading

A Level-1 Heading

## A Level-2 Heading (etc.)

A Level-2 Heading (etc.)

Line breaks
don't matter.

But blank lines
create new paragraphs.

Line breaks don’t matter.

But blank lines create new paragraphs.

[Create links](http://software-carpentry.org) with `[...](...)`.
Or use [named links][data_carpentry].

[data_carpentry]: http://datacarpentry.org

Create links with [...](...). Or use named links.

Creating Lists in Markdown

Create a nested list in a Markdown cell in a notebook that looks like this:

  1. Get funding.
  2. Do work.
    • Design experiment.
    • Collect data.
    • Analyze.
  3. Write up.
  4. Publish.

Solution

This challenge integrates both the numbered list and bullet list. Note that the bullet list is indented 2 spaces so that it is inline with the items of the numbered list.

1.  Get funding.
2.  Do work.
    *   Design experiment.
    *   Collect data.
    *   Analyze.
3.  Write up.
4.  Publish.

More Math

What is displayed when a Python cell in a notebook that contains several calculations is executed? For example, what happens when this cell is executed?

7 * 3
2 + 1

Solution

Python returns the output of the last calculation.

3

Change an Existing Cell from Code to Markdown

What happens if you write some Python in a code cell and then you switch it to a Markdown cell? For example, put the following in a code cell:

x = 6 * 7 + 12
print(x)

And then run it with Shift+Return to be sure that it works as a code cell. Now go back to the cell and use Esc then m to switch the cell to Markdown and “run” it with Shift+Return. What happened and how might this be useful?

Solution

The Python code gets treated like Markdown text. The lines appear as if they are part of one contiguous paragraph. This could be useful to temporarily turn on and off cells in notebooks that get used for multiple purposes.

x = 6 * 7 + 12 print(x)

Equations

Standard Markdown (such as we’re using for these notes) won’t render equations, but the Notebook will. Create a new Markdown cell and enter the following:

$\sum_{i=1}^{N} 2^{-i} \approx 1$

(It’s probably easier to copy and paste.) What does it display? What do you think the underscore, _, circumflex, ^, and dollar sign, $, do?

Solution

The notebook shows the equation as it would be rendered from LaTeX equation syntax. The dollar sign, $, is used to tell Markdown that the text in between is a LaTeX equation. If you’re not familiar with LaTeX, underscore, _, is used for subscripts and circumflex, ^, is used for superscripts. A pair of curly braces, { and }, is used to group text together so that the statement i=1 becomes the subscript and N becomes the superscript. Similarly, -i is in curly braces to make the whole statement the superscript for 2. \sum and \approx are LaTeX commands for “sum over” and “approximate” symbols.

Closing JupyterLab

$ jupyter lab

Closing JupyterLab

Practice closing and restarting the JupyterLab server.

Key Points

  • Python scripts are plain text files.

  • Use the Jupyter Notebook for editing and running Python.

  • The Notebook has Command and Edit modes.

  • Use the keyboard and mouse to select and edit cells.

  • The Notebook will turn Markdown into pretty-printed documentation.

  • Markdown does most of what HTML does.


Variables and Assignment

Overview

Teaching: 15 min
Exercises: 10 min
Questions
  • How can I store data in programs?

Objectives
  • Write programs that assign scalar values to variables and perform calculations with those values.

  • Correctly trace value changes in programs that use scalar assignment.

Use variables to store values.

Use print to display values.

print(first_name, 'is', age, 'years old')
Ahmed is 42 years old

Variables must be created before they are used.

print(last_name)
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
<ipython-input-1-c1fbb4e96102> in <module>()
----> 1 print(last_name)

NameError: name 'last_name' is not defined

Variables Persist Between Cells

Be aware that it is the order of execution of cells that is important in a Jupyter notebook, not the order in which they appear. Python will remember all the code that was run previously, including any variables you have defined, irrespective of the order in the notebook. Therefore if you define variables lower down the notebook and then (re)run cells further up, those defined further down will still be present. As an example, create two cells with the following content, in this order:

print(myval)
myval = 1

If you execute this in order, the first cell will give an error. However, if you run the first cell after the second cell it will print out 1. To prevent confusion, it can be helpful to use the Kernel -> Restart & Run All option which clears the interpreter and runs everything from a clean slate going top to bottom.

Variables can be used in calculations.

age = age + 3
print('Age in three years:', age)
Age in three years: 45

Use an index to get a single character from a string.

A line of Python code, print(atom_name[0]), demonstrates that using the zero index will output just the initial letter, in this case 'h' for helium.

atom_name = 'helium'
print(atom_name[0])
h

Use a slice to get a substring.

atom_name = 'sodium'
print(atom_name[0:3])
sod

Use the built-in function len to find the length of a string.

print(len('helium'))
6

Python is case-sensitive.

Use meaningful variable names.

flabadab = 42
ewr_422_yY = 'Ahmed'
print(ewr_422_yY, 'is', flabadab, 'years old')

Predicting Values

What is the final value of position in the program below? (Try to predict the value without running the program, then check your prediction.)

initial = 'left'
position = initial
initial = 'right'
print(position)

Solution

left

The initial variable is assigned the value 'left'. In the second line, the position variable also receives the string value 'left'. In third line, the initial variable is given the value 'right', but the position variable retains its string value of 'left'.

Challenge

If you assign a = 123, what happens if you try to get the second digit of a via a[1]?

Solution

Numbers are not strings or sequences and Python will raise an error if you try to perform an index operation on a number. In the next lesson on types and type conversion we will learn more about types and how to convert between different types. If you want the Nth digit of a number you can convert it into a string using the str built-in function and then perform an index operation on that string.

a = 123
print(a[1])
TypeError: 'int' object is not subscriptable
a = str(123)
print(a[1])
2

Choosing a Name

Which is a better variable name, m, min, or minutes? Why? Hint: think about which code you would rather inherit from someone who is leaving the lab:

  1. ts = m * 60 + s
  2. tot_sec = min * 60 + sec
  3. total_seconds = minutes * 60 + seconds

Solution

minutes is better because min might mean something like “minimum” (and actually is an existing built-in function in Python that we will cover later).

Slicing practice

What does the following program print?

atom_name = 'carbon'
print('atom_name[1:3] is:', atom_name[1:3])

Solution

atom_name[1:3] is: ar

Slicing concepts

Given the following string:

species_name = "Acacia buxifolia"

What would these expressions return?

  1. species_name[2:8]
  2. species_name[11:] (without a value after the colon)
  3. species_name[:4] (without a value before the colon)
  4. species_name[:] (just a colon)
  5. species_name[11:-3]
  6. species_name[-5:-3]
  7. What happens when you choose a stop value which is out of range? (i.e., try species_name[0:20] or species_name[:103])

Solutions

  1. species_name[2:8] returns the substring 'acia b'
  2. species_name[11:] returns the substring 'folia', from position 11 until the end
  3. species_name[:4] returns the substring 'Acac', from the start up to but not including position 4
  4. species_name[:] returns the entire string 'Acacia buxifolia'
  5. species_name[11:-3] returns the substring 'fo', from the 11th position to the third last position
  6. species_name[-5:-3] also returns the substring 'fo', from the fifth last position to the third last
  7. If a part of the slice is out of range, the operation does not fail. species_name[0:20] gives the same result as species_name[0:], and species_name[:103] gives the same result as species_name[:]

Key Points

  • Use variables to store values.

  • Use print to display values.

  • Variables persist between cells.

  • Variables must be created before they are used.

  • Variables can be used in calculations.

  • Use an index to get a single character from a string.

  • Use a slice to get a substring.

  • Use the built-in function len to find the length of a string.

  • Python is case-sensitive.

  • Use meaningful variable names.


Data Types and Type Conversion

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • What kinds of data do programs store?

  • How can I convert one type to another?

Objectives
  • Explain key differences between integers and floating point numbers.

  • Explain key differences between numbers and character strings.

  • Use built-in functions to convert between integers, floating point numbers, and strings.

Every value has a type.

Use the built-in function type to find the type of a value.

print(type(52))
<class 'int'>
fitness = 'average'
print(type(fitness))
<class 'str'>

In a notebook you can use the %whos command to find out information about variables which are set in the session.

%whos

Types control what operations (or methods) can be performed on a given value.

print(5 - 3)
2
print('hello' - 'h')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-2-67f5626a1e07> in <module>()
----> 1 print('hello' - 'h')

TypeError: unsupported operand type(s) for -: 'str' and 'str'

You can use the “+” and “*” operators on strings.

full_name = 'Ahmed' + ' ' + 'Walsh'
print(full_name)
Ahmed Walsh
separator = '=' * 10
print(separator)
==========

Strings have a length (but numbers don’t).

print(len(full_name))
11
print(len(52))
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-3-f769e8e8097d> in <module>()
----> 1 print(len(52))

TypeError: object of type 'int' has no len()

Must convert numbers to strings or vice versa when operating on them.

print(1 + '2')
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-4-fe4f54a023c6> in <module>()
----> 1 print(1 + '2')

TypeError: unsupported operand type(s) for +: 'int' and 'str'
print(1 + int('2'))
print(str(1) + '2')
3
12

Can mix integers and floats freely in operations.

print('half is', 1 / 2.0)
print('three squared is', 3.0 ** 2)
half is 0.5
three squared is 9.0

Variables only change value when something is assigned to them.

variable_one = 1
variable_two = 5 * variable_one
variable_one = 2
print('first is', variable_one, 'and second is', variable_two)
first is 2 and second is 5

Fractions

What type of value is 3.4? How can you find out?

Solution

It is a floating-point number (often abbreviated “float”). It is possible to find out by using the built-in function type().

print(type(3.4))
<class 'float'>

Automatic Type Conversion

What type of value is 3.25 + 4?

Solution

It is a float: integers are automatically converted to floats as necessary.

result = 3.25 + 4
print(result, 'is', type(result))
7.25 is <class 'float'>

Division Types

In Python 3, the // operator performs integer (whole-number) floor division, the / operator performs floating-point division, and the % (or modulo) operator calculates and returns the remainder from integer division:

print('5 // 3:', 5 // 3)
print('5 / 3:', 5 / 3)
print('5 % 3:', 5 % 3)
5 // 3: 1
5 / 3: 1.6666666666666667
5 % 3: 2

If num_subjects is the number of subjects taking part in a study, and num_per_survey is the number that can take part in a single survey, write an expression that calculates the number of surveys needed to reach everyone once.

Solution

We want the minimum number of surveys that reaches everyone once, which is the rounded up value of num_subjects/ num_per_survey. This is equivalent to performing a floor division with // and adding 1. Before the division we need to subtract 1 from the number of subjects to deal with the case where num_subjects is evenly divisible by num_per_survey.

num_subjects = 600
num_per_survey = 42
num_surveys = (num_subjects - 1) // num_per_survey + 1

print(num_subjects, 'subjects,', num_per_survey, 'per survey:', num_surveys)
600 subjects, 42 per survey: 15

Strings to Numbers

Where reasonable, float() will convert a string to a floating point number, and int() will convert a floating point number to an integer:

print("string to float:", float("3.4"))
print("float to int:", int(3.4))
string to float: 3.4
float to int: 3

If the conversion doesn’t make sense, however, an error message will occur.

print("string to float:", float("Hello world!"))
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-5-df3b790bf0a2> in <module>
----> 1 print("string to float:", float("Hello world!"))

ValueError: could not convert string to float: 'Hello world!'

Given this information, what do you expect the following program to do?

What does it actually do?

Why do you think it does that?

print("fractional string to int:", int("3.4"))

Solution

What do you expect this program to do? It would not be so unreasonable to expect the Python 3 int command to convert the string “3.4” to 3.4 and an additional type conversion to 3. After all, Python 3 performs a lot of other magic - isn’t that part of its charm?

int("3.4")
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-2-ec6729dfccdc> in <module>
----> 1 int("3.4")
ValueError: invalid literal for int() with base 10: '3.4'

However, Python 3 throws an error. Why? To be consistent, possibly. If you ask Python to perform two consecutive typecasts, you must convert it explicitly in code.

int(float("3.4"))
3

Arithmetic with Different Types

Which of the following will return the floating point number 2.0? Note: there may be more than one right answer.

first = 1.0
second = "1"
third = "1.1"
  1. first + float(second)
  2. float(second) + float(third)
  3. first + int(third)
  4. first + int(float(third))
  5. int(first) + int(float(third))
  6. 2.0 * second

Solution

Answer: 1 and 4

Key Points

  • Every value has a type.

  • Use the built-in function type to find the type of a value.

  • Types control what operations can be done on values.

  • Strings can be added and multiplied.

  • Strings have a length (but numbers don’t).

  • Must convert numbers to strings or vice versa when operating on them.

  • Can mix integers and floats freely in operations.

  • Variables only change value when something is assigned to them.


Built-in Functions and Help

Overview

Teaching: 15 min
Exercises: 10 min
Questions
  • How can I use built-in functions?

  • How can I find out what they do?

  • What kind of errors can occur in programs?

Objectives
  • Explain the purpose of functions.

  • Correctly call built-in Python functions.

  • Correctly nest calls to built-in functions.

  • Use help to display documentation for built-in functions.

  • Correctly describe situations in which SyntaxError and NameError occur.

Use comments to add documentation to programs.

# This sentence isn't executed by Python.
adjustment = 0.5   # Neither is this - anything after '#' is ignored.

A function may take zero or more arguments.

print('before')
print()
print('after')
before

after

Every function returns something.

result = print('example')
print('result of print is', result)
example
result of print is None

Commonly-used built-in functions include max, min, and round.

max_value = max(1, 2, 3)
print('maximum value is:', max_value)
min_value = min('a', 'A', '0')
print('minimum value is:', min_value)
maximum value is: 3
minimum value is: 0

Functions may only work for certain (combinations of) arguments.

print(max(1, 'a'))
TypeError                                 Traceback (most recent call last)
<ipython-input-52-3f049acf3762> in <module>
----> 1 print(max(1, 'a'))

TypeError: '>' not supported between instances of 'str' and 'int'

Functions may have default values for some arguments.

round(3.712)
4
round(3.712, 1)
3.7

Use the built-in function help to get help for a function.

help(round)
Help on built-in function round in module builtins:

round(number, ndigits=None)
    Round a number to a given precision in decimal digits.

    The return value is an integer if ndigits is omitted or None.  Otherwise
    the return value has the same type as the number.  ndigits may be negative.

The Jupyter Notebook has two ways to get help.

Functions attached to objects are called methods

my_string = 'Hello world!'  # creation of a string object

print(len(my_string))       # the len function takes a string as an argument and returns the length of the string

print(my_string.swapcase()) # calling the swapcase method on the my_string object

print(my_string.__len__())  # calling the internal __len__ method on the my_string object, used by len(my_string)

12
hELLO WORLD!
12
print(my_string.isupper())          # Not all the letters are uppercase
print(my_string.upper())            # This capitalizes all the letters

print(my_string.upper().isupper())  # Now all the letters are uppercase
False
HELLO WORLD
True

Python reports a syntax error when it can’t understand the source of a program.

# Forgot to close the quote marks around the string.
name = 'Feng
  File "<ipython-input-56-f42768451d55>", line 2
    name = 'Feng
                ^
SyntaxError: EOL while scanning string literal
# An extra '=' in the assignment.
age = = 52
  File "<ipython-input-57-ccc3df3cf902>", line 2
    age = = 52
          ^
SyntaxError: invalid syntax
print("hello world"
  File "<ipython-input-6-d1cc229bf815>", line 1
    print ("hello world"
                        ^
SyntaxError: unexpected EOF while parsing

Python reports a runtime error when something goes wrong while a program is executing.

age = 53
remaining = 100 - aege # mis-spelled 'age'
NameError                                 Traceback (most recent call last)
<ipython-input-59-1214fb6c55fc> in <module>
      1 age = 53
----> 2 remaining = 100 - aege # mis-spelled 'age'

NameError: name 'aege' is not defined

What Happens When

  1. Explain in simple terms the order of operations in the following program: when does the addition happen, when does the subtraction happen, when is each function called, etc.
  2. What is the final value of radiance?
radiance = 1.0
radiance = max(2.1, 2.0 + min(radiance, 1.1 * radiance - 0.5))

Solution

  1. Order of operations:
    1. 1.1 * radiance = 1.1
    2. 1.1 - 0.5 = 0.6
    3. min(radiance, 0.6) = 0.6
    4. 2.0 + 0.6 = 2.6
    5. max(2.1, 2.6) = 2.6
  2. At the end, radiance = 2.6

Spot the Difference

  1. Predict what each of the print statements in the program below will print.
  2. Does max(len(rich), poor) run or produce an error message? If it runs, does its result make any sense?
easy_string = "abc"
print(max(easy_string))
rich = "gold"
poor = "tin"
print(max(rich, poor))
print(max(len(rich), len(poor)))

Solution

print(max(easy_string))
c
print(max(rich, poor))
tin
print(max(len(rich), len(poor)))
4

max(len(rich), poor) throws a TypeError. This turns into max(4, 'tin') and as we discussed earlier a string and integer cannot meaningfully be compared.

TypeError                                 Traceback (most recent call last)
<ipython-input-65-bc82ad05177a> in <module>
----> 1 max(len(rich), poor)

TypeError: '>' not supported between instances of 'str' and 'int'

Why Not?

Why is it that max and min do not return None when they are called with no arguments?

Solution

max and min return TypeErrors in this case because the correct number of parameters was not supplied. If it just returned None, the error would be much harder to trace as it would likely be stored into a variable and used later in the program, only to likely throw a runtime error.

Last Character of a String

If Python starts counting from zero, and len returns the number of characters in a string, what index expression will get the last character in the string name? (Note: we will see a simpler way to do this in a later episode.)

Solution

name[len(name) - 1]

Explore the Python docs!

The official Python documentation is arguably the most complete source of information about the language. It is available in different languages and contains a lot of useful resources. The Built-in Functions page contains a catalogue of all of these functions, including the ones that we’ve covered in this lesson. Some of these are more advanced and unnecessary at the moment, but others are very simple and useful.

Key Points

  • Use comments to add documentation to programs.

  • A function may take zero or more arguments.

  • Commonly-used built-in functions include max, min, and round.

  • Functions may only work for certain (combinations of) arguments.

  • Functions may have default values for some arguments.

  • Use the built-in function help to get help for a function.

  • The Jupyter Notebook has two ways to get help.

  • Every function returns something.

  • Python reports a syntax error when it can’t understand the source of a program.

  • Python reports a runtime error when something goes wrong while a program is executing.

  • Fix syntax errors by reading the source code, and runtime errors by tracing the program’s execution.


Break

Overview

Teaching: 0 min
Exercises: 0 min
Questions
Objectives

Key Points


Lists

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • How can I store multiple values?

Objectives
  • Explain why programs need collections of values.

  • Write programs that create flat lists, index them, slice them, and modify them through assignment and method calls.

A list stores many values in a single structure.

pressures = [0.273, 0.275, 0.277, 0.275, 0.276]
print('pressures:', pressures)
print('length:', len(pressures))
pressures: [0.273, 0.275, 0.277, 0.275, 0.276]
length: 5

Use an item’s index to fetch it from a list.

print('zeroth item of pressures:', pressures[0])
print('fourth item of pressures:', pressures[4])
zeroth item of pressures: 0.273
fourth item of pressures: 0.276

Lists’ values can be replaced by assigning to them.

pressures[0] = 0.265
print('pressures is now:', pressures)
pressures is now: [0.265, 0.275, 0.277, 0.275, 0.276]

Appending items to a list lengthens it.

primes = [2, 3, 5]
print('primes is initially:', primes)
primes.append(7)
print('primes has become:', primes)
primes is initially: [2, 3, 5]
primes has become: [2, 3, 5, 7]
teen_primes = [11, 13, 17, 19]
middle_aged_primes = [37, 41, 43, 47]
print('primes is currently:', primes)
primes.extend(teen_primes)
print('primes has now become:', primes)
primes.append(middle_aged_primes)
print('primes has finally become:', primes)
primes is currently: [2, 3, 5, 7]
primes has now become: [2, 3, 5, 7, 11, 13, 17, 19]
primes has finally become: [2, 3, 5, 7, 11, 13, 17, 19, [37, 41, 43, 47]]

Note that while extend maintains the “flat” structure of the list, appending a list to a list makes the result two-dimensional - the last element in primes is a list, not an integer.

Use del to remove items from a list entirely.

primes = [2, 3, 5, 7, 9]
print('primes before removing last item:', primes)
del primes[4]
print('primes after removing last item:', primes)
primes before removing last item: [2, 3, 5, 7, 9]
primes after removing last item: [2, 3, 5, 7]

The empty list contains no values.

Lists may contain values of different types.

goals = [1, 'Create lists.', 2, 'Extract items from lists.', 3, 'Modify lists.']

Character strings can be indexed like lists.

element = 'carbon'
print('zeroth character:', element[0])
print('third character:', element[3])
zeroth character: c
third character: b

Character strings are immutable.

element[0] = 'C'
TypeError: 'str' object does not support item assignment

Indexing beyond the end of the collection is an error.

print('99th element of element is:', element[99])
IndexError: string index out of range

Fill in the Blanks

Fill in the blanks so that the program below produces the output shown.

values = ____
values.____(1)
values.____(3)
values.____(5)
print('first time:', values)
values = values[____]
print('second time:', values)
first time: [1, 3, 5]
second time: [3, 5]

Solution

values = []
values.append(1)
values.append(3)
values.append(5)
print('first time:', values)
values = values[1:]
print('second time:', values)

How Large is a Slice?

If start and stop are both non-negative integers, how long is the list values[start:stop]?

Solution

The list values[start:stop] has up to stop - start elements. For example, values[1:4] has the 3 elements values[1], values[2], and values[3]. Why ‘up to’? As we saw in episode 2, if stop is greater than the total length of the list values, we will still get a list back but it will be shorter than expected.

From Strings to Lists and Back

Given this:

print('string to list:', list('tin'))
print('list to string:', ''.join(['g', 'o', 'l', 'd']))
string to list: ['t', 'i', 'n']
list to string: gold
  1. What does list('some string') do?
  2. What does '-'.join(['x', 'y', 'z']) generate?

Solution

  1. list('some string') converts a string into a list containing all of its characters.
  2. join returns a string that is the concatenation of each string element in the list and adds the separator between each element in the list. This results in x-y-z. The separator between the elements is the string that provides this method.

Working With the End

What does the following program print?

element = 'helium'
print(element[-1])
  1. How does Python interpret a negative index?
  2. If a list or string has N elements, what is the most negative index that can safely be used with it, and what location does that index represent?
  3. If values is a list, what does del values[-1] do?
  4. How can you display all elements but the last one without changing values? (Hint: you will need to combine slicing and negative indexing.)

Solution

The program prints m.

  1. Python interprets a negative index as starting from the end (as opposed to starting from the beginning). The last element is -1.
  2. The last index that can safely be used with a list of N elements is element -N, which represents the first element.
  3. del values[-1] removes the last element from the list.
  4. values[:-1]

Stepping Through a List

What does the following program print?

element = 'fluorine'
print(element[::2])
print(element[::-1])
  1. If we write a slice as low:high:stride, what does stride do?
  2. What expression would select all of the even-numbered items from a collection?

Solution

The program prints

furn
eniroulf
  1. stride is the step size of the slice.
  2. The slice 1::2 selects all even-numbered items from a collection: it starts with element 1 (which is the second element, since indexing starts at 0), goes on until the end (since no end is given), and uses a step size of 2 (i.e., selects every second element).

Sort and Sorted

What do these two programs print? In simple terms, explain the difference between sorted(letters) and letters.sort().

# Program A
letters = list('gold')
result = sorted(letters)
print('letters is', letters, 'and result is', result)
# Program B
letters = list('gold')
result = letters.sort()
print('letters is', letters, 'and result is', result)

Solution

Program A prints

letters is ['g', 'o', 'l', 'd'] and result is ['d', 'g', 'l', 'o']

Program B prints

letters is ['d', 'g', 'l', 'o'] and result is None

sorted(letters) returns a sorted copy of the list letters (the original list letters remains unchanged), while letters.sort() sorts the list letters in-place and does not return anything.

Copying (or Not)

What do these two programs print? In simple terms, explain the difference between new = old and new = old[:].

# Program A
old = list('gold')
new = old      # simple assignment
new[0] = 'D'
print('new is', new, 'and old is', old)
# Program B
old = list('gold')
new = old[:]   # assigning a slice
new[0] = 'D'
print('new is', new, 'and old is', old)

Solution

Program A prints

new is ['D', 'o', 'l', 'd'] and old is ['D', 'o', 'l', 'd']

Program B prints

new is ['D', 'o', 'l', 'd'] and old is ['g', 'o', 'l', 'd']

new = old makes new a reference to the list old; new and old point towards the same object.

new = old[:] however creates a new list object new containing all elements from the list old; new and old are different objects.

Key Points

  • A list stores many values in a single structure.

  • Use an item’s index to fetch it from a list.

  • Lists’ values can be replaced by assigning to them.

  • Appending items to a list lengthens it.

  • Use del to remove items from a list entirely.

  • The empty list contains no values.

  • Lists may contain values of different types.

  • Character strings can be indexed like lists.

  • Character strings are immutable.

  • Indexing beyond the end of the collection is an error.


Libraries

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • How can I use software that other people have written?

  • How can I find out what that software does?

Objectives
  • Explain what software libraries are and why programmers create and use them.

  • Write programs that import and use modules from Python’s standard library.

  • Find and read documentation for the standard library interactively (in the interpreter) and online.

Most of the power of a programming language is in its libraries.

Libraries and modules

A library is a collection of modules, but the terms are often used interchangeably, especially since many libraries only consist of a single module, so don’t worry if you mix them.

A program must import a library module before using it.

import math

print('pi is', math.pi)
print('cos(pi) is', math.cos(math.pi))
pi is 3.141592653589793
cos(pi) is -1.0

Use help to learn about the contents of a library module.

help(math)
Help on module math:

NAME
    math

MODULE REFERENCE
    http://docs.python.org/3/library/math

    The following documentation is automatically generated from the Python
    source files.  It may be incomplete, incorrect or include features that
    are considered implementation detail and may vary between Python
    implementations.  When in doubt, consult the module reference at the
    location listed above.

DESCRIPTION
    This module is always available.  It provides access to the
    mathematical functions defined by the C standard.

FUNCTIONS
    acos(x, /)
        Return the arc cosine (measured in radians) of x.
⋮ ⋮ ⋮

Import specific items from a library module to shorten programs.

from math import cos, pi

print('cos(pi) is', cos(pi))
cos(pi) is -1.0

Create an alias for a library module when importing it to shorten programs.

import math as m

print('cos(pi) is', m.cos(m.pi))
cos(pi) is -1.0

Exploring the Math Module

  1. What function from the math module can you use to calculate a square root without using sqrt?
  2. Since the library contains this function, why does sqrt exist?

Solution

  1. Using help(math) we see that we’ve got pow(x,y) in addition to sqrt(x), so we could use pow(x, 0.5) to find a square root.
  2. The sqrt(x) function is arguably more readable than pow(x, 0.5) when implementing equations. Readability is a cornerstone of good programming, so it makes sense to provide a special function for this specific common case.

    Also, the design of Python’s math library has its origin in the C standard, which includes both sqrt(x) and pow(x,y), so a little bit of the history of programming is showing in Python’s function names.

Locating the Right Module

You want to select a random character from a string:

bases = 'ACTTGCTTGAC'
  1. Which standard library module could help you?
  2. Which function would you select from that module? Are there alternatives?
  3. Try to write a program that uses the function.

Solution

The random module seems like it could help.

The string has 11 characters, each having a positional index from 0 to 10. You could use the random.randrange or random.randint functions to get a random integer between 0 and 10, and then select the bases character at that index:

from random import randrange

random_index = randrange(len(bases))
print(bases[random_index])

or more compactly:

from random import randrange

print(bases[randrange(len(bases))])

Perhaps you found the random.sample function? It allows for slightly less typing but might be a bit harder to understand just by reading:

from random import sample

print(sample(bases, 1)[0])

Note that this function returns a list of values.

The simplest and shortest solution is the random.choice function that does exactly what we want:

from random import choice

print(choice(bases))

When Is Help Available?

When a colleague of yours types help(math), Python reports an error:

NameError: name 'math' is not defined

What has your colleague forgotten to do?

Solution

Importing the math module (import math)

Importing With Aliases

  1. Fill in the blanks so that the program below prints 90.0.
  2. Rewrite the program so that it uses import without as.
  3. Which form do you find easier to read?
import math as m
angle = ____.degrees(____.pi / 2)
print(____)

Solution

import math as m
angle = m.degrees(m.pi / 2)
print(angle)

can be written as

import math
angle = math.degrees(math.pi / 2)
print(angle)

Since you just wrote the code and are familiar with it, you might actually find the first version easier to read. But when trying to read a huge piece of code written by someone else, or when getting back to your own huge piece of code after several months, non-abbreviated names are often easier, except where there are clear abbreviation conventions.

Importing Specific Items

  1. Fill in the blanks so that the program below prints 90.0.
  2. Do you find this version easier to read than preceding ones?
  3. Why wouldn’t programmers always use this form of import?
____ math import ____, ____
angle = degrees(pi / 2)
print(angle)

Solution

from math import degrees, pi
angle = degrees(pi / 2)
print(angle)

Most likely you find this version easier to read since it’s less dense. The main reason not to use this form of import is to avoid name clashes. For instance, you wouldn’t import degrees this way if you also wanted to use the name degrees for a variable or function of your own. Or if you were to also import a function named degrees from another library.

Reading Error Messages

  1. Read the code below and try to identify what the errors are without running it.
  2. Run the code, and read the error message. What type of error is it?
from math import log
log(0)

Solution

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-1-d72e1d780bab> in <module>
      1 from math import log
----> 2 log(0)

ValueError: math domain error
  1. The logarithm of x is only defined for x > 0, so 0 is outside the domain of the function.
  2. You get an error of type ValueError, indicating that the function received an inappropriate argument value. The additional message “math domain error” makes it clearer what the problem is.

Key Points

  • Most of the power of a programming language is in its libraries.

  • A program must import a library module in order to use it.

  • Use help to learn about the contents of a library module.

  • Import specific items from a library to shorten programs.

  • Create an alias for a library when importing it to shorten programs.


Working with NumPy

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • What is NumPy and how do I use it?

Objectives
  • Import the NumPy library.

  • Create a NumPy array.

  • Apply functions to NumPy arrays.

Working with the NumPy library

import numpy as np

primes = np.array([2, 3, 5, 7, 11])
print(primes)
[2 3 5 7 11]

The NumPy array looks similar to a list, but let’s take a closer look:

print(type(primes))
print(len(primes))
print(primes.shape)
print(primes.dtype)
<class 'numpy.ndarray'>
5
(5,)
int64

Array functions

NumPy provides many functions, including its own versions of min and max:

print(np.min(primes))
print(np.max(primes))
print(np.mean(primes))
2
11
5.6

A NumPy array will have many methods available, including min, max and mean:

print(primes.min())
print(primes.max())
print(primes.mean())
2
11
5.6

NumPy functions can operate on all elements in an array. For example, what happens if we try to run the math.sin function on multiple items?

import math

sequence = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
math.sin(sequence)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_76280/1284448365.py in <module>
      2 
      3 sequence = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
----> 4 math.sin(sequence)

TypeError: must be real number, not list

The math.sin function can only process a single value.

The NumPy sin function can process multiple values:

sequence = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
np.sin(sequence)
array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

The sequence list is converted to a NumPy ndarray during this process.

Multi dimensional arrays

NumPy arrays can have multiple dimensions:

values = np.array([[0, 7, 2], [4, 4, 5]])
print(values)
print(values.shape)
[[0 7 2]
 [4 4 5]]
(2, 3)

The values array is two dimensional, with 2 rows and 3 columns.

Values in NumPy arrays with multiple dimension have multiple indexes. The index of the value 5 in the array is [1, 2]. The row or y index comes first, followed by the column or x index:

print(values[1, 2]) 
5

Finding the median value

If we can find the mean value of the values array with:

print(values.mean())

Can we find the median value in a similar way? If not, is there another way to find the median value?

Solution

print(np.median(values)) 
4.0

The ndarray type does not have a median method, so values.median() does not work. However, the numpy library does include the median function, which can be applied to an array.

Applying functions along an axis

What is the difference between these commands and the results they return?

print(values.max())
print(values.max(axis=0))
print(values.max(axis=1))

Solution

7
[4 7 5]
[7 5]

The first command returns the maximum value from the whole array. The second command returns the maximum value from each column (axis=0). The third command returns the maximum value from each row (axis=1).

Data types

What is the data type of the values array, and how could the array be created with a different data type, e.g. np.float32?

Solution

values = np.array([[0, 7, 2], [4, 4, 5]])
print(values.dtype)
print(values)

values = np.array([[0, 7, 2], [4, 4, 5]], dtype=np.float32)
print(values.dtype)
print(values)
int64
[[0 7 2]
 [4 4 5]]
float32
[[0. 7. 2.]
 [4. 4. 5.]]

The dtype argument can be used to specify the data type when creating a NumPy array.

NaN values

If we create an array containing a NaN (not a number) value, how do we find the maximum value?

results = np.array([0.3, 7.2, np.nan, 4.5, 9.7])

Solution

print(results.max())
print(np.nanmax(results))
nan
9.7

NumPy includes functions, such as nanmax, which will ignore any NaN values in the input.

Key Points

  • NumPy provides many funtions for working with numerical data.

  • The NumPy ndarray can be used to store numerical data with multiple dimensions.

  • The NumPy functions enable efficient processing of values in a ndarray.


Plotting

Overview

Teaching: 20 min
Exercises: 10 min
Questions
  • How can I plot my data?

Objectives
  • Use matplotlib to create various plots

matplotlib is the most widely used scientific plotting library in Python.

import matplotlib.pyplot as plt
time = [0, 1, 2, 3]
position = [0, 100, 200, 300]

plt.plot(time, position)
plt.xlabel('Time (hr)')
plt.ylabel('Position (km)')

Simple Position-Time Plot

Display All Open Figures

In our Jupyter Notebook example, running the cell should generate the figure directly below the code. The figure is also included in the Notebook document for future viewing. However, other Python environments like an interactive Python session started from a terminal or a Python script executed via the command line require an additional command to display the figure.

Instruct matplotlib to show a figure:

plt.show()

This command can also be used within a Notebook - for instance, to display multiple figures if several are created by a single cell.

Plotting data from NumPy arrays

Let’s generate some data using NumPy:

x = np.arange(0, 10, 0.1)
sin_x = np.sin(x)
cos_x = np.cos(x)

The np.arange function will generate an array of number starting at 0 and stopping before 10, with an interval of 0.1.

We can plot the value of sin(x) and cos(x) on the same axes:

plt.plot(x, sin_x)
plt.plot(x, cos_x)

Sin and Cos Plot 1

We can set the colour of the lines using the c option to plot(), and we can add a legend to indicate which values belong to which series:

plt.plot(x, sin_x, c='teal', label='sin(x)')
plt.plot(x, cos_x, c='peru', label='cos(x)')
plt.legend(loc='lower left')

Sin and Cos Plot 2

Adding a Legend

Often when plotting multiple datasets on the same figure it is desirable to have a legend describing the data.

This can be done in matplotlib in two stages:

  • Provide a label for each dataset in the figure:
plt.plot(x, sin_x, label='sin(x)')
plt.plot(x, cos_x, label='cos(x)')
  • Instruct matplotlib to create the legend.
plt.legend()

By default matplotlib will attempt to place the legend in a suitable position. If you would rather specify a position this can be done with the loc= argument, e.g to place the legend in the upper left corner of the plot, specify loc='upper left'

Matplotlib is capable of making many type of plots. We can create a scatter plot of the sin(x) values:

plt.figure(figsize=(10, 8))
plt.scatter(x, sin_x, c=x, s=x*3)
plt.xlabel('x', fontsize=16)
plt.ylabel('sin(x)', fontsize=16)
plt.title('sine plot', fontsize=18)
plt.tick_params(labelsize=14)
plt.colorbar()
plt.savefig('sin.png')

Sin Scatter Plot

Each plotting function in Matplotlib has its own set of argmuents. The documentation for the scatter() function can be found here.

The plt.figure(figsize=(10, 8)) command is used to create a figure of the specified size. The default units for Matplotlib figures are inches. In this instance the figure size is adjusted, to avoid any axes labels being cut off when saving the plot to a file, to 10 inches wide by 8 inches high.

The c=x option sets the colour value of the scatter points based on the value of x. The s=x*3 options sets the size of the scatter points based on the values of x multiplied by 3.

A title is added to the plot using plt.title(). The font size is set using the fontsize argument for the title, x axis label and y axis label. To set the font size for the tick labels, the plt._tick_params() function is used, where the size is set using the labelsize option.

A colour scale is added using the function plt.colorbar().

Saving your plot to a file

If you are satisfied with the plot you see you may want to save it to a file, perhaps to include it in a publication. There is a function in the matplotlib.pyplot module that accomplishes this: savefig. Calling this function, e.g. with

plt.savefig('my_figure.png')

will save the current figure to the file my_figure.png. The file format will automatically be deduced from the file name extension (other formats are pdf, ps, eps and svg).

Note that functions in plt refer to a global figure variable and after a figure has been displayed to the screen (e.g. with plt.show) matplotlib will make this variable refer to a new empty figure. Therefore, make sure you call plt.savefig before the plot is displayed to the screen, otherwise you may find a file with an empty plot.

Creating a figure containing multiple plots

In the above examples, Matplotlib is automatically creating the figure and axes for each plot, but there are various ways in which these elements can be manually created where required.

For example, the plt.subplots() function can be used to create a figure which contains multiple sets of axes.

If we wished to create a figure containing two plots, we could use the command:

fig, ax = plt.subplots(nrows=2, ncols=1)

Empty subplots

This function returns two values, the figure, which we have stored as fig and the axes, which we have stored as ax.

If we print() the ax variable, which should see that this is a list containing the axes we have requested:

fig, ax = plt.subplots(nrows=2, ncols=1)
print(ax)
[<Axes: > <Axes: >]

We can access each set of axes from the ax variable, and create a plot within.

When plotting this way, we access the plotting functions (e.g. plot(), scatter()) as a method of the axes.

We could create a figure containing subplots of sin(x) and cos(x) using the following method:

# create the figure and axes:
fig, ax = plt.subplots(nrows=2, ncols=1)

# access the first set of axes:
ax0 = ax[0]
# plot sin(x) in the first axes:
ax0.plot(x, sin_x)
# set the plot title:
ax0.set_title('sin(x)')

# access the second set of axes:
ax1 = ax[1]
# plot cos(x) in the second axes:
ax1.plot(x, cos_x)
# set the plot title:
ax1.set_title('cos(x)')

# set the figure title:
fig.suptitle('plots of sin(x) and y(x)')
# save the figure:
fig.savefig('sin_and_cos_plots.png')

Sin and cos subplots

Plotting 2d data

Matplotlib has various options available for plotting 2d data, such as:

To test some of these, we will first use NumPy to generate some 2d data.

# create the x and y values, from -10 to 10, with a 0.1 increment, using the
# numpy arange function:
x = np.arange(-10, 10.1, 0.1)
y = np.arange(-10, 10.1, 0.1)

# create x and y coordinate grids using the numpy meshgrid function:
grid_x, grid_y = np.meshgrid(x, y)

Here, we use the NumPy meshgrid function, which is a versatile NumPy function used to create coordinate grids from one-dimensional coordinate arrays. It is widely used in mathematical computations, plotting, and simulations, where grid-like data is essential.

This example provides a demonstration of how the meshgrid function works, and the output which it creates:

x_coords, y_coords = np.meshgrid([1, 2, 3], [6, 7, 8, 9])

print(x_coords)
print(y_coords)
[[1 2 3]
 [1 2 3]
 [1 2 3]
 [1 2 3]]

[[6 6 6]
 [7 7 7]
 [8 8 8]
 [9 9 9]]

We can again use the NumPy sin function to generate our 2d data:

# generate 2d values for plotting using the numpy sin function:
z = np.sin(grid_x * grid_y)

Once we have some 2d data, a quick way to take a look at the data is using Matplotlib’s imshow function:

plt.imshow(z)

2d sin imshow plot

The imshow function allows us to take a quick look at the data, but does not include the values for the x or y axes.

If we use Matplotlib’s contourf function, we can include the x and y values in our plot:

# Create filled contour plot using Matplob's contourf function:
plt.contourf(x, y, z)
# Add a colour bar:
plt.colorbar()

2d sin contourf plot

Making your plots accessible

Whenever you are generating plots to go into a paper or a presentation, there are a few things you can do to make sure that everyone can understand your plots.

  • Always make sure your text is large enough to read. Use the fontsize parameter in xlabel, ylabel, title, and legend, and tick_params with labelsize to increase the text size of the numbers on your axes.
  • Similarly, you should make your graph elements easy to see. Use s to increase the size of your scatterplot markers and linewidth to increase the sizes of your plot lines.
  • Using color (and nothing else) to distinguish between different plot elements will make your plots unreadable to anyone who is colorblind, or who happens to have a black-and-white office printer. For lines, the linestyle parameter lets you use different types of lines. For scatterplots, marker lets you change the shape of your points. If you’re unsure about your colors, you can use Coblis or Color Oracle to simulate what your plots would look like to those with colorblindness.

Colour scales

Try and recreate the scatter plot for cos(x), and see if you can change the colour scale to ‘jet’.

More information about Matplotlib colour maps can be found here.

Solution

plt.figure(figsize=(10, 8))
plt.scatter(x, cos_x, c=x, s=x*3, cmap='jet')
plt.xlabel('x', fontsize=16)
plt.ylabel('sin(x)', fontsize=16)
plt.title('Cosine plot', fontsize=18)
plt.tick_params(labelsize=14)
plt.colorbar()
plt.savefig('cos.png')

Cos Scatter Plot

More colour scales

See if you can create a plot of the 2d z data, using the pcolormesh function.

Select a suitable diverging colour map for the plot .

Solution

plt.figure(figsize=(10, 8))
plt.pcolormesh(x, y, z, cmap='RdBu')
plt.colorbar()
plt.savefig('2d_sin.png')

2d sin pcolormesh plot

Key Points

  • matplotlib is the most widely used scientific plotting library in Python.

  • Many styles of plot are available: see the Python Graph Gallery for more options.

  • Can plot many sets of data together.


Reading Tabular Data with Pandas

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • How can I read tabular data?

Objectives
  • Import the Pandas library.

  • Use Pandas to load a simple CSV data set.

  • Get some basic information about a Pandas DataFrame.

  • Plot the data in a Pandas DataFrame.

Use the Pandas library to do statistics on tabular data.

We are going to read some temperature data, collected by the NCAS weather station in Leeds.

import pandas as pd

data = pd.read_csv('data/temperature_2022-07.csv')
print(data)
         Date  Max Temperature  Average Temperature  Min Temperature
0   2022-07-01             19.3                 15.2             12.6
1   2022-07-02             20.3                 16.3             13.3
2   2022-07-03             20.4                 15.7             11.7
...
28  2022-07-29             22.7                 17.6             14.2
29  2022-07-30             20.7                 18.2             16.2
30  2022-07-31             22.7                 18.6             15.8

File Not Found

Our lessons store their data files in a data sub-directory, which is why the path to the file is data/temperature_2022-07.csv. If you forget to include data/, or if you include it but your copy of the file is somewhere else, you will get a runtime error that ends with a line like this:

FileNotFoundError: [Errno 2] No such file or directory: 'data/temperature_2022-07.csv`

Use index_col to specify that a column’s values should be used as row headings.

data = pd.read_csv('data/temperature_2022-07.csv', index_col='Date')
print(data)
            Max Temperature  Average Temperature  Min Temperature
Date                                                             
2022-07-01             19.3                 15.2             12.6
2022-07-02             20.3                 16.3             13.3
2022-07-03             20.4                 15.7             11.7
...
2022-07-29             22.7                 17.6             14.2
2022-07-30             20.7                 18.2             16.2
2022-07-31             22.7                 18.6             15.8

Use the DataFrame.info() method to find out more about a DataFrame.

data.info()
<class 'pandas.core.frame.DataFrame'>
Index: 31 entries, 2022-07-01 to 2022-07-31
Data columns (total 3 columns):
 #   Column               Non-Null Count  Dtype  
---  ------               --------------  -----  
 0   Max Temperature      31 non-null     float64
 1   Average Temperature  31 non-null     float64
 2   Min Temperature      31 non-null     float64
dtypes: float64(3)
memory usage: 992.0+ bytes

The DataFrame.columns variable stores information about the DataFrame’s columns.

print(data.columns)
Index(['Max Temperature', 'Average Temperature', 'Min Temperature'], dtype='object')

Use DataFrame.T to transpose a DataFrame.

print(data.T)
Date                 2022-07-01  2022-07-02  2022-07-03  2022-07-04  \
Max Temperature            19.3        20.3        20.4        18.2   
Average Temperature        15.2        16.3        15.7        15.3   
Min Temperature            12.6        13.3        11.7        13.5   

Date                 2022-07-05  2022-07-06  2022-07-07  2022-07-08  \
Max Temperature            20.7        19.9        24.5        23.7   
Average Temperature        16.2        16.8        18.3        18.8   
Min Temperature            12.2        14.6        14.1        14.3   

Date                 2022-07-09  2022-07-10  ...  2022-07-22  2022-07-23  \
Max Temperature            24.5        28.3  ...        17.1        21.3   
Average Temperature        19.1        21.0  ...        15.7        18.2   
Min Temperature            14.8        13.6  ...        14.6        13.6   

Date                 2022-07-24  2022-07-25  2022-07-26  2022-07-27  \
Max Temperature            23.9        22.5        19.7        23.2   
Average Temperature        19.9        17.4        14.9        17.4   
Min Temperature            17.8        13.3        12.6        12.1   

Date                 2022-07-28  2022-07-29  2022-07-30  2022-07-31  
Max Temperature            20.7        22.7        20.7        22.7  
Average Temperature        16.6        17.6        18.2        18.6  
Min Temperature            13.5        14.2        16.2        15.8  

[3 rows x 31 columns]

Use DataFrame.describe() to get summary statistics about data.

DataFrame.describe() gets the summary statistics of only the columns that have numerical data. All other columns are ignored, unless you use the argument include='all'.

print(data.describe())
       Max Temperature  Average Temperature  Min Temperature
count        31.000000            31.000000        31.000000
mean         23.567742            18.816129        14.745161
std           5.056441             3.553880         2.533093
min          17.100000            14.900000        11.600000
25%          20.350000            16.500000        13.350000
50%          22.500000            18.200000        14.200000
75%          24.700000            19.500000        15.600000
max          39.300000            29.100000        21.700000

Accessing values by column

To access the values in a particular column, the column name can be accessed from the DataFrame in a similar way to accessing values in a list by index. To access the Average Temperature values:

print(data['Average Temperature'])
Date
2022-07-01    15.2
2022-07-02    16.3
2022-07-03    15.7
2022-07-04    15.3
2022-07-05    16.2
2022-07-06    16.8
2022-07-07    18.3
2022-07-08    18.8
2022-07-09    19.1
2022-07-10    21.0
2022-07-11    23.6
2022-07-12    22.6
2022-07-13    18.3
2022-07-14    16.5
2022-07-15    16.5
2022-07-16    18.6
2022-07-17    23.4
2022-07-18    28.7
2022-07-19    29.1
2022-07-20    21.7
2022-07-21    17.1
2022-07-22    15.7
2022-07-23    18.2
2022-07-24    19.9
2022-07-25    17.4
2022-07-26    14.9
2022-07-27    17.4
2022-07-28    16.6
2022-07-29    17.6
2022-07-30    18.2
2022-07-31    18.6
Name: Average Temperature, dtype: float64

The index column, which is Date in this example can not be accessed in this way, but is instead accessed using the index property of the DataFrame:

print(data.index)
Index(['2022-07-01', '2022-07-02', '2022-07-03', '2022-07-04', '2022-07-05',
       '2022-07-06', '2022-07-07', '2022-07-08', '2022-07-09', '2022-07-10',
       '2022-07-11', '2022-07-12', '2022-07-13', '2022-07-14', '2022-07-15',
       '2022-07-16', '2022-07-17', '2022-07-18', '2022-07-19', '2022-07-20',
       '2022-07-21', '2022-07-22', '2022-07-23', '2022-07-24', '2022-07-25',
       '2022-07-26', '2022-07-27', '2022-07-28', '2022-07-29', '2022-07-30',
       '2022-07-31'],
      dtype='object', name='Date')

Plotting a DataFrame

First, we will re-read the CSV file, telling Pandas to parse the ‘Date’ values to convert them in to Pandas Timestamp objects:

data = pd.read_csv('data/temperature_2022-07.csv', index_col='Date', parse_dates=['Date'])
data.head()

What does the data.head() function do? What do you think data.tail() might do?

Pandas makes quick plotting of data very simple:

data.plot()

Temperature Plot 1

A specific column can be plotted, by using the y argument:

plt.style.use('ggplot')
data.plot(y='Average Temperature')
plt.ylabel('temperature (°C)')
plt.xlabel('date')

Temperature Plot 2

Note how we have changes the style of the plot using plt.style.use('ggplot').

Running plt.style.use('default') will switch back to using the default style.

More information about Matplotlib styles can be found here.

Plot types

The DataFrame plot() method can produce different kinds of plots, which can be specified using the kind= argument.

The Pandas documentation describes the available options.

How could you create a box plot of the data values?

Solution

data.plot(kind='box')

Temperature Box Plot

Reading Other Data

Read the data in temperature_2022-08.csv (which should be in the same directory as temperature_2022-07.csv) into a variable called more_data, display its summary statistics, and plot the values.

Solution

To read in a CSV, we use pd.read_csv and pass the filename 'data/temperature_2022-08.csv' to it. The summary statistics can be displayed with the DataFrame.describe() method. more_data.plot() will plot the values.

more_data = pd.read_csv('data/temperature_2022-08.csv', index_col='Date', parse_dates=['Date'])
print(more_data.describe())
more_data.plot()

Writing Data

As well as the read_csv function for reading data from a file, Pandas provides a to_csv function to write DataFrames to files. Applying what you’ve learned about reading from files, write one of your DataFrames to a file called processed.csv. You can use help to get information on how to use to_csv.

Solution

In order to write the DataFrame more_data to a file called processed.csv, execute the following command:

more_data.to_csv('processed.csv')

For help on to_csv, you could execute, for example:

help(more_data.to_csv)

Note that help(to_csv) throws an error! This is a subtlety and is due to the fact that to_csv is NOT a function in and of itself and the actual call is more_data.to_csv.

Key Points

  • Use the Pandas library to get basic statistics out of tabular data.

  • Use index_col to specify that a column’s values should be used as row headings.

  • Use DataFrame.info to find out more about a dataframe.

  • The DataFrame.columns variable stores information about the dataframe’s columns.

  • Use DataFrame.T to transpose a dataframe.

  • Use DataFrame.describe to get summary statistics about data.

  • Use DataFrame.plot to plot the data.


End Of First Session

Overview

Teaching: 45 min
Exercises: 0 min
Questions
Objectives

You may also wish to consider:

Key Points


For Loops

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • How can I make a program do many things?

Objectives
  • Explain what for loops are normally used for.

  • Trace the execution of a simple (unnested) loop and correctly state the values of variables in each iteration.

  • Write for loops that use the Accumulator pattern to aggregate values.

A for loop executes commands once for each value in a collection.

for number in [2, 3, 5]:
    print(number)
print(2)
print(3)
print(5)
2
3
5

A for loop is made up of a collection, a loop variable, and a body.

for number in [2, 3, 5]:
    print(number)

The first line of the for loop must end with a colon, and the body must be indented.

for number in [2, 3, 5]:
print(number)
IndentationError: expected an indented block
firstName = "Jon"
  lastName = "Smith"
  File "<ipython-input-7-f65f2962bf9c>", line 2
    lastName = "Smith"
    ^
IndentationError: unexpected indent

Loop variables can be called anything.

for kitten in [2, 3, 5]:
    print(kitten)

The body of a loop can contain many statements.

primes = [2, 3, 5]
for p in primes:
    squared = p ** 2
    cubed = p ** 3
    print(p, squared, cubed)
2 4 8
3 9 27
5 25 125

Use range to iterate over a sequence of numbers.

print('a range is not a list: range(0, 3)')
for number in range(0, 3):
    print(number)
a range is not a list: range(0, 3)
0
1
2

The Accumulator pattern turns many values into one.

# Sum the first 10 integers.
total = 0
for number in range(10):
   total = total + (number + 1)
print(total)
55

Classifying Errors

Is an indentation error a syntax error or a runtime error?

Solution

An IndentationError is a syntax error. Programs with syntax errors cannot be started. A program with a runtime error will start but an error will be thrown under certain conditions.

Practice Accumulating

Fill in the blanks in each of the programs below to produce the indicated result.

# Total length of the strings in the list: ["red", "green", "blue"] => 12
total = 0
for word in ["red", "green", "blue"]:
    ____ = ____ + len(word)
print(total)

Solution

total = 0
for word in ["red", "green", "blue"]:
    total = total + len(word)
print(total)
# List of word lengths: ["red", "green", "blue"] => [3, 5, 4]
lengths = ____
for word in ["red", "green", "blue"]:
    lengths.____(____)
print(lengths)

Solution

lengths = []
for word in ["red", "green", "blue"]:
    lengths.append(len(word))
print(lengths)
# Concatenate all words: ["red", "green", "blue"] => "redgreenblue"
words = ["red", "green", "blue"]
result = ____
for ____ in ____:
    ____
print(result)

Solution

words = ["red", "green", "blue"]
result = ""
for word in words:
    result = result + word
print(result)

Identifying Variable Name Errors

  1. Read the code below and try to identify what the errors are without running it.
  2. Run the code and read the error message. What type of NameError do you think this is? Is it a string with no quotes, a misspelled variable, or a variable that should have been defined but was not?
  3. Fix the error.
  4. Repeat steps 2 and 3, until you have fixed all the errors.
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (Number % 3) == 0:
        message = message + a
    else:
        message = message + "b"
print(message)

Solution

  • Python variable names are case sensitive: number and Number refer to different variables.
  • The variable message needs to be initialized as an empty string.
  • We want to add the string "a" to message, not the undefined variable a.
message = ""
for number in range(10):
    # use a if the number is a multiple of 3, otherwise use b
    if (number % 3) == 0:
        message = message + "a"
    else:
        message = message + "b"
print(message)

Identifying Item Errors

  1. Read the code below and try to identify what the errors are without running it.
  2. Run the code, and read the error message. What type of error is it?
  3. Fix the error.
seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print('My favorite season is ', seasons[4])

Solution

This list has 4 elements and the index to access the last element in the list is 3.

seasons = ['Spring', 'Summer', 'Fall', 'Winter']
print('My favorite season is ', seasons[3])

Key Points

  • A for loop executes commands once for each value in a collection.

  • A for loop is made up of a collection, a loop variable, and a body.

  • The first line of the for loop must end with a colon, and the body must be indented.

  • Indentation is always meaningful in Python.

  • Loop variables can be called anything (but it is strongly advised to have a meaningful name to the looping variable).

  • The body of a loop can contain many statements.

  • Use range to iterate over a sequence of numbers.

  • The Accumulator pattern turns many values into one.


Conditionals

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • How can programs do different things for different data?

Objectives
  • Correctly write programs that use if and else statements and simple Boolean expressions (without logical operators).

  • Trace the execution of unnested conditionals and conditionals inside loops.

Use if statements to control whether or not a block of code is executed.

mass = 3.54
if mass > 3.0:
    print(mass, 'is large')

mass = 2.07
if mass > 3.0:
    print (mass, 'is large')
3.54 is large

Conditionals are often used inside loops.

masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 3.0:
        print(m, 'is large')
3.54 is large
9.22 is large

Use else to execute a block of code when an if condition is not true.

masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')
3.54 is large
2.07 is small
9.22 is large
1.86 is small
1.71 is small

Use elif to specify additional tests.

masses = [3.54, 2.07, 9.22, 1.86, 1.71]
for m in masses:
    if m > 9.0:
        print(m, 'is HUGE')
    elif m > 3.0:
        print(m, 'is large')
    else:
        print(m, 'is small')
3.54 is large
2.07 is small
9.22 is HUGE
1.86 is small
1.71 is small

Conditions are tested once, in order.

grade = 85
if grade >= 70:
    print('grade is C')
elif grade >= 80:
    print('grade is B')
elif grade >= 90:
    print('grade is A')
grade is C
velocity = 10.0
if velocity > 20.0:
    print('moving too fast')
else:
    print('adjusting velocity')
    velocity = 50.0
adjusting velocity
velocity = 10.0
for i in range(5): # execute the loop 5 times
    print(i, ':', velocity)
    if velocity > 20.0:
        print('moving too fast')
        velocity = velocity - 5.0
    else:
        print('moving too slow')
        velocity = velocity + 10.0
print('final velocity:', velocity)
0 : 10.0
moving too slow
1 : 20.0
moving too slow
2 : 30.0
moving too fast
3 : 25.0
moving too fast
4 : 20.0
moving too slow
final velocity: 30.0

Compound Relations Using and, or, and Parentheses

Often, you want some combination of things to be true. You can combine relations within a conditional using and and or. Continuing the example above, suppose you have

mass     = [ 3.54,  2.07,  9.22,  1.86,  1.71]
velocity = [10.00, 20.00, 30.00, 25.00, 20.00]

i = 0
for i in range(5):
    if mass[i] > 5 and velocity[i] > 20:
        print("Fast heavy object.  Duck!")
    elif mass[i] > 2 and mass[i] <= 5 and velocity[i] <= 20:
        print("Normal traffic")
    elif mass[i] <= 2 and velocity[i] <= 20:
        print("Slow light object.  Ignore it")
    else:
        print("Whoa!  Something is up with the data.  Check it")

Just like with arithmetic, you can and should use parentheses whenever there is possible ambiguity. A good general rule is to always use parentheses when mixing and and or in the same condition. That is, instead of:

if mass[i] <= 2 or mass[i] >= 5 and velocity[i] > 20:

write one of these:

if (mass[i] <= 2 or mass[i] >= 5) and velocity[i] > 20:
if mass[i] <= 2 or (mass[i] >= 5 and velocity[i] > 20):

so it is perfectly clear to a reader (and to Python) what you really mean.

Tracing Execution

What does this program print?

pressure = 71.9
if pressure > 50.0:
    pressure = 25.0
elif pressure <= 50.0:
    pressure = 0.0
print(pressure)

Solution

25.0

Trimming Values

Fill in the blanks so that this program creates a new list containing zeroes where the original list’s values were negative and ones where the original list’s values were positive.

original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = ____
for value in original:
    if ____:
        result.append(0)
    else:
        ____
print(result)
[0, 1, 1, 1, 0, 1]

Solution

original = [-1.5, 0.2, 0.4, 0.0, -1.3, 0.4]
result = []
for value in original:
    if value < 0.0:
        result.append(0)
    else:
        result.append(1)
print(result)

Processing Small Files

Modify this program so that it only processes files with fewer than 50 records.

import glob
import pandas as pd
for filename in glob.glob('data/*.csv'):
    contents = pd.read_csv(filename)
    ____:
        print(filename, len(contents))

Solution

import glob
import pandas as pd
for filename in glob.glob('data/*.csv'):
    contents = pd.read_csv(filename)
    if len(contents) < 50:
        print(filename, len(contents))

Initializing

Modify this program so that it finds the largest and smallest values in the list no matter what the range of values originally is.

values = [...some test data...]
smallest, largest = None, None
for v in values:
    if ____:
        smallest, largest = v, v
    ____:
        smallest = min(____, v)
        largest = max(____, v)
print(smallest, largest)

What are the advantages and disadvantages of using this method to find the range of the data?

Solution

values = [-2,1,65,78,-54,-24,100]
smallest, largest = None, None
for v in values:
    if smallest is None and largest is None:
        smallest, largest = v, v
    else:
        smallest = min(smallest, v)
        largest = max(largest, v)
print(smallest, largest)

If you wrote == None instead of is None, that works too, but Python programmers always write is None because of the special way None works in the language.

It can be argued that an advantage of using this method would be to make the code more readable. However, a disadvantage is that this code is not efficient because within each iteration of the for loop statement, there are two more loops that run over two numbers each (the min and max functions). It would be more efficient to iterate over each number just once:

values = [-2,1,65,78,-54,-24,100]
smallest, largest = None, None
for v in values:
    if smallest is None or v < smallest:
        smallest = v
    if largest is None or v > largest:
        largest = v
print(smallest, largest)

Now we have one loop, but four comparison tests. There are two ways we could improve it further: either use fewer comparisons in each iteration, or use two loops that each contain only one comparison test. The simplest solution is often the best:

values = [-2,1,65,78,-54,-24,100]
smallest = min(values)
largest = max(values)
print(smallest, largest)

Key Points

  • Use if statements to control whether or not a block of code is executed.

  • Conditionals are often used inside loops.

  • Use else to execute a block of code when an if condition is not true.

  • Use elif to specify additional tests.

  • Conditions are tested once, in order.


Writing Functions

Overview

Teaching: 15 min
Exercises: 15 min
Questions
  • How can I create my own functions?

Objectives
  • Explain and identify the difference between function definition and function call.

  • Write a function that takes a small, fixed number of arguments and produces a single result.

Break programs down into functions to make them easier to understand.

Define a function using def with a name, parameters, and a block of code.

def print_greeting():
    print('Hello!')

Defining a function does not run it.

print_greeting()
Hello!

Arguments in a function call are matched to its defined parameters.

def print_date(year, month, day):
    joined = str(year) + '/' + str(month) + '/' + str(day)
    print(joined)

print_date(1871, 3, 19)
1871/3/19

Or, we can name the arguments when we call the function, which allows us to specify them in any order and adds clarity to the call site; otherwise as one is reading the code they might forget if the second argument is the month or the day for example.

print_date(month=3, day=19, year=1871)
1871/3/19

Functions may return a result to their caller using return.

def average(values):
    if len(values) == 0:
        return None
    return sum(values) / len(values)
a = average([1, 3, 4])
print('average of values:', a)
average of values: 2.6666666666666665
print('average of empty list:', average([]))
average of empty list: None
result = print_date(1871, 3, 19)
print('result of call is:', result)
1871/3/19
result of call is: None

Adding helpful information

Helpful information can be added to a function using a docstring.

After the def line of a function, textual information explaining what the function does can be added using a multi line comment.

Multi line comments start and end with three quotation marks, """:

def average(values):
    """
    Return the average of a set of values
    """
    if len(values) == 0:
        return None
    return sum(values) / len(values)

help(average)
Help on function average in module __main__:

average(values)
    Return the average of a set of values

Identifying Syntax Errors

  1. Read the code below and try to identify what the errors are without running it.
  2. Run the code and read the error message. Is it a SyntaxError or an IndentationError?
  3. Fix the error.
  4. Repeat steps 2 and 3 until you have fixed all the errors.
def another_function
  print("Syntax errors are annoying.")
   print("But at least python tells us about them!")
  print("So they are usually not too hard to fix.")

Solution

def another_function():
  print("Syntax errors are annoying.")
  print("But at least Python tells us about them!")
  print("So they are usually not too hard to fix.")

Definition and Use

What does the following program print?

def report(pressure):
    print('pressure is', pressure)

print('calling', report, 22.5)

Solution

calling <function report at 0x7fd128ff1bf8> 22.5

A function call always needs parenthesis, otherwise you get memory address of the function object. So, if we wanted to call the function named report, and give it the value 22.5 to report on, we could have our function call as follows

print("calling")
report(22.5)
calling
pressure is 22.5

Order of Operations

  1. What’s wrong in this example?

     result = print_time(11, 37, 59)
    
     def print_time(hour, minute, second):
        time_string = str(hour) + ':' + str(minute) + ':' + str(second)
        print(time_string)
    
  2. After fixing the problem above, explain why running this example code:

     result = print_time(11, 37, 59)
     print('result of call is:', result)
    

    gives this output:

     11:37:59
     result of call is: None
    
  3. Why is the result of the call None?

Solution

  1. The problem with the example is that the function print_time() is defined after the call to the function is made. Python doesn’t know how to resolve the name print_time since it hasn’t been defined yet and will raise a NameError e.g., NameError: name 'print_time' is not defined

  2. The first line of output 11:37:59 is printed by the first line of code, result = print_time(11, 37, 59) that binds the value returned by invoking print_time to the variable result. The second line is from the second print call to print the contents of the result variable.

  3. print_time() does not explicitly return a value, so it automatically returns None.

Encapsulation

Fill in the blanks to create a function that takes a single filename as an argument, loads the data in the file named by the argument, and returns the minimum value in that data.

import pandas as pd

def min_in_data(____):
    data = ____
    return ____

Solution

import pandas as pd

def min_in_data(filename):
    data = pd.read_csv(filename)
    return data.min()

Find the First

Fill in the blanks to create a function that takes a list of numbers as an argument and returns the first negative value in the list. What does your function do if the list is empty? What if the list has no negative numbers?

def first_negative(values):
    for v in ____:
        if ____:
            return ____

Solution

def first_negative(values):
    for v in values:
        if v < 0:
            return v

If an empty list or a list with all positive values is passed to this function, it returns None:

my_list = []
print(first_negative(my_list))
None

Calling by Name

Earlier we saw this function:

def print_date(year, month, day):
    joined = str(year) + '/' + str(month) + '/' + str(day)
    print(joined)

We saw that we can call the function using named arguments, like this:

print_date(day=1, month=2, year=2003)
  1. What does print_date(day=1, month=2, year=2003) print?
  2. When have you seen a function call like this before?
  3. When and why is it useful to call functions this way?

Solution

  1. 2003/2/1
  2. We saw examples of using named arguments when working with the pandas library. For example, when reading in a dataset using data = pd.read_csv('data/temperature_2022-07.csv', index_col='Date') the last argument index_col is a named argument.
  3. Using named arguments can make code more readable since one can see from the function call what name the different arguments have inside the function. It can also reduce the chances of passing arguments in the wrong order, since by using named arguments the order doesn’t matter.

Key Points

  • Break programs down into functions to make them easier to understand.

  • Define a function using def with a name, parameters, and a block of code.

  • Defining a function does not run it.

  • Arguments in a function call are matched to its defined parameters.

  • Functions may return a result to their caller using return.


Break

Overview

Teaching: 0 min
Exercises: 0 min
Questions
Objectives

Key Points


Variable Scope

Overview

Teaching: 5 min
Exercises: 5 min
Questions
  • How do function calls actually work?

  • How can I determine where errors occurred?

Objectives
  • Identify local and global variables.

  • Identify parameters as local variables.

  • Read a traceback and determine the file, function, and line number on which the error occurred, the type of error, and the error message.

The scope of a variable is the part of a program that can ‘see’ that variable.

pressure = 103.9

def adjust(t):
    temperature = t * 1.43 / pressure
    return temperature
print('adjusted:', adjust(0.9))
print('temperature after call:', temperature)
adjusted: 0.01238691049085659
Traceback (most recent call last):
  File "/Users/swcarpentry/foo.py", line 8, in <module>
    print('temperature after call:', temperature)
NameError: name 'temperature' is not defined

Reading Error Messages

Read the traceback below, and identify the following:

  1. How many levels does the traceback have?
  2. What is the file name where the error occurred?
  3. What is the function name where the error occurred?
  4. On which line number in this function did the error occur?
  5. What is the type of error?
  6. What is the error message?
---------------------------------------------------------------------------
KeyError                                  Traceback (most recent call last)
<ipython-input-2-e4c4cbafeeb5> in <module>()
      1 import errors_02
----> 2 errors_02.print_friday_message()

/Users/ghopper/thesis/code/errors_02.py in print_friday_message()
     13
     14 def print_friday_message():
---> 15     print_message("Friday")

/Users/ghopper/thesis/code/errors_02.py in print_message(day)
      9         "sunday": "Aw, the weekend is almost over."
     10     }
---> 11     print(messages[day])
     12
     13

KeyError: 'Friday'

Solution

  1. Three levels.
  2. errors_02.py
  3. print_message
  4. Line 11
  5. KeyError. These errors occur when we are trying to look up a key that does not exist (usually in a data structure such as a dictionary). We can find more information about the KeyError and other built-in exceptions in the Python docs.
  6. KeyError: 'Friday'

Key Points

  • The scope of a variable is the part of a program that can ‘see’ that variable.


Dictionaries

Overview

Teaching: 20 min
Exercises: 15 min
Questions
  • How can I store key-value data?

Objectives
  • Store data in a Python dictionary

  • Extract data from a Python dictionary

Python provides a data type called a dictionary, which is similar to a list in that it is a collection of objects.

Dictionaries and lists share the following characteristics:

Dictionaries differ from lists primarily in how elements are accessed:

This section aims to provide a good sense of when a dictionary is the appropriate data type to use, and how to do so.

Creating a dictionary

We can define a dictionary by enclosing a comma-separated list of key-value pairs in curly braces ({}). A colon (:) separates each key from its associated value:

person = {
    'name': 'Ahmed',
    'age': 42
}

Once we have defined a dictionary, we can inspect its type and values:

print(type(person))
print(person)
<class 'dict'>
{'name': 'Ahmed', 'age': 42}

A value is retrieved from a dictionary by specifying its corresponding key in square brackets ([]):

print(person['name'])
print(person['age'])
Ahmed
42

Updating a dictionary

Let’s create a dictionary, containing the details for multiple individuals, where the details for each individual are also stored as dictionaries:

people = {
    'Ahmed': {
        'age': 42
    },
    'Cheryl': {
        'age': 33
    }
}

print(people)
{'Ahmed': {'age': 42}, 'Cheryl': {'age': 33}}

We can add a new entry to a dictionary by defining a new key:

people['Susan'] = {'age': 25}

print(people)
{'Ahmed': {'age': 42}, 'Cheryl': {'age': 33}, 'Susan': {'age': 25}}

An existing value in a dictionary can be updated by redefining the value associated with an existing key:

people['Cheryl']['age'] = 34

print(people)
{'Ahmed': {'age': 42}, 'Cheryl': {'age': 34}, 'Susan': {'age': 25}}

The available keys in a dictionary can be accessed using the dictionary’s .keys() method:

print(people.keys())
dict_keys(['Ahmed', 'Cheryl', 'Susan'])

Using a dictionary to store data

In our data/ directory, we have three files containing temperature data for June, July and August 2022:

We can create a dictionary to store this information:

temp_files = {
    'June': 'data/temperature_2022-06.csv',
    'July': 'data/temperature_2022-07.csv',
    'August': 'data/temperature_2022-08.csv'
}

print(temp_files)
{'June': 'data/temperature_2022-06.csv', 'July': 'data/temperature_2022-07.csv', 'August': 'data/temperature_2022-08.csv'}

If we would like to create a single figure containing three plots, one for each month, we can do this by accessing values from the dictionary:

# Create the subplot axes, with a single row and three columns:
fig, axs = plt.subplots(nrows=1, ncols=3)

# Set the figure size:
fig.set_figwidth(24)
fig.set_figheight(8)

# Plot June data
# access the axes for this plot:
ax = axs[0]
# Set the month name:
month = 'June'
# Read the data for this month using Pandas:
data = pd.read_csv(temp_files[month], index_col='Date', parse_dates=['Date'])
# Plot the data using the Pandas DataFrame plot function:
data.plot(ax=ax)
# Set the plot title:
ax.set_title(month)

# Plot July data
# access the axes for this plot:
ax = axs[1]
# Set the month name:
month = 'July'
# Read the data for this month using Pandas:
data = pd.read_csv(temp_files[month], index_col='Date', parse_dates=['Date'])
# Plot the data using the Pandas DataFrame plot function:
data.plot(ax=ax)
# Set the plot title:
ax.set_title(month)

# Plot August data
# access the axes for this plot:
ax = axs[2]
# Set the month name:
month = 'August'
# Read the data for this month using Pandas:
data = pd.read_csv(temp_files[month], index_col='Date', parse_dates=['Date'])
# Plot the data using the Pandas DataFrame plot function:
data.plot(ax=ax)
# Set the plot title:
ax.set_title(month)

Temperature Plot 1

The three plots in the figure do not have the same y axis limits, which makes it difficult to compare the values.

We can fix this by setting the y axis limits, using the set_ylim function:

# Create the subplot axes, with a single row and three columns:
fig, axs = plt.subplots(nrows=1, ncols=3)

# Set the figure size:
fig.set_figwidth(24)
fig.set_figheight(8)

# Plot June data
# access the axes for this plot:
ax = axs[0]
# Set the month name:
month = 'June'
# Read the data for this month using Pandas:
data = pd.read_csv(temp_files[month], index_col='Date', parse_dates=['Date'])
# Plot the data using the Pandas DataFrame plot function:
data.plot(ax=ax)
# Set the y axis limits:
ax.set_ylim((0, 40))
# Set the plot title:
ax.set_title(month)

# Plot July data
# access the axes for this plot:
ax = axs[1]
# Set the month name:
month = 'July'
# Read the data for this month using Pandas:
data = pd.read_csv(temp_files[month], index_col='Date', parse_dates=['Date'])
# Plot the data using the Pandas DataFrame plot function:
data.plot(ax=ax)
# Set the y axis limits:
ax.set_ylim((0, 40))
# Set the plot title:
ax.set_title(month)

# Plot August data
# access the axes for this plot:
ax = axs[2]
# Set the month name:
month = 'August'
# Read the data for this month using Pandas:
data = pd.read_csv(temp_files[month], index_col='Date', parse_dates=['Date'])
# Plot the data using the Pandas DataFrame plot function:
data.plot(ax=ax)
# Set the y axis limits:
ax.set_ylim((0, 40))
# Set the plot title:
ax.set_title(month)

Temperature Plot 2

Plotting in a loop

Can you recreate the figure above by looping through the values in the dictionary. You may wish to use the enumerate function.

Solution

# Create the subplot axes, with a single row and three columns:
fig, axs = plt.subplots(nrows=1, ncols=3)

# Set the figure size:
fig.set_figwidth(24)
fig.set_figheight(8)

# Loop through dictionary keys, using the enumerate function:
for index, month in enumerate(temp_files.keys()):
    # Access the axes for this plot:
    ax = axs[index]
    # Set the month name from the dictoinary key name:
    month = month
    # Read the data for this month using Pandas:
    data = pd.read_csv(temp_files[month], index_col='Date', parse_dates=['Date'])
    # Plot the data using the Pandas DataFrame plot function:
    data.plot(ax=ax)
    # Set the y axis limits:
    ax.set_ylim((0, 40))
    # Set the plot title:
    ax.set_title(month)

Temperature Plot 2

Key Points

  • Python dictionaries are one of the most versatile and efficient data types in Python.


Working With Geospatial data

Overview

Teaching: 0 min
Exercises: 0 min
Questions
  • How can I read in and plot geospatial data?

Objectives
  • Read in data from NetCDF files

  • Plot data on a map

Installing additional libraries with conda

The conda command line tool is used to manage packages and environments within an Anaconda installation.

From a terminal, or Anaconda prompt, the libraries we will be using can be installed with:

conda install -y -c conda-forge iris cartopy

After installing these packages, it may be necessary to open a new terminal and launch a new Jupyter Lab session for things to work correctly.

Reading in a NetCDF file using Iris is done using the load() or load_cube() functions.

We have been provided some NetCDF files containing ERA5 global temperature data from the ECMWF.

The file data/era5_mean_temp_1981-2010.nc contains the global mean temperature data for 1981 to 2010. The file data/era5_mean_annual_temp_2018-2022.nc contains global annual mean temperatures for the years 2018 to 2022.

The files we will be looking at contain a single variable, so we can load data from our first file with the load_cube() function:

import iris

hist_temp = iris.load_cube('data/era5_mean_temp_1981-2010.nc')
print(hist_temp)
2 metre temperature / (K)           (latitude: 181; longitude: 360)
    Dimension coordinates:
        latitude                             x               -
        longitude                            -               x
    Scalar coordinates:
        expver                      1
        time                        1995-12-17 00:00:00, bound=(1981-01-01 00:00:00, 2010-12-01 00:00:00)
    Cell methods:
        0                           time: mean
    Attributes:
        Conventions                 'CF-1.7'
        history                     '2023-12-10 09:47:57 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf.bin ...'

The hist_temp object is an iris cube. This object contains the variable data as well as related metadata.

print(type(hist_temp))
<class 'iris.cube.Cube'>

Loading data from files containing multiple variables

When using the iris load() function it will produce a list of cubes, one for each variable in the file, which may look similar to this:

cubes = iris.load('data/20191201.nc')
print(cubes)
0: 2 metre temperature / (K)           (time: 4; latitude: 256; longitude: 512)
1: air_pressure_at_mean_sea_level / (Pa) (time: 4; latitude: 256; longitude: 512)

We can see there are two variables in the file, 2 metre temperature and air_pressure_at_mean_sea_level.

The variable cubes is an Iris cube list, and the variables in the list can be accessed by the index value:

print(cubes[0])
2 metre temperature / (K)           (time: 4; latitude: 256; longitude: 512)
    Dimension coordinates:
        time                             x            -               -
        latitude                         -            x               -
        longitude
...

We can load a single variable from a file, by passing the variable name to Iris.

Variables in NetCDF files can contain multiple versions of the name. The name to use when loading a particular variable with Iris will be the same as we saw when looking at the cubes variable, so to load the 2m temperature variable, we use the name 2 metre temperature:

temp = iris.load_cube('data/20191201.nc', '2 metre temperature')
print(temp)
2 metre temperature / (K)           (time: 4; latitude: 256; longitude: 512)
    Dimension coordinates:
        time                             x            -               -
        latitude                         -            x               -
        longitude
...

The data values for an Iris cube can be found in the data property:

print(hist_temp.data)
print(type(hist_temp.data))
[[258.93490373 258.93490373 258.93490373 ... 258.93490373 258.93490373
  258.93490373]
 [259.1283197  259.13209851 259.13575203 ... 259.12782856 259.12798893
  259.12817938]
 [259.32918805 259.33184424 259.33421477 ... 259.31327094 259.31432339
  259.32189605]
 ...
 [228.46974617 228.43380738 228.39738245 ... 228.53184095 228.5038557
  228.48709162]
 [228.06460655 228.0546333  228.0449006  ... 228.10377788 228.09062722
  228.07752166]
 [227.69386212 227.69386212 227.69386212 ... 227.69386212 227.69386212
  227.69386212]]
<class 'numpy.ma.core.MaskedArray'>

We can see that Iris stores data in a Numpy array. The type of array used to store the data is a MaskedArray, which means that values can be masked out, for example if there was only data for values over land, the points which were in ocean areas may be masked out.

As the data is a Numpy array, we can find out some more information about the shape of the array, and the values it contains:

print(hist_temp.data.shape)
print(hist_temp.data.min())
print(hist_temp.data.max())
print(hist_temp.data.mean())
(181, 360)
219.82352664773725
306.46696949834524
278.2130725081754

The time, latitude and longitude information for the data can be accessed using the coord property of the cube:

print(hist_temp.coord('time'))
print(hist_temp.coord('latitude'))
print(hist_temp.coord('longitude'))
DimCoord :  time / (hours since 1900-01-01 00:00:00.0, standard calendar)
    points: [1995-12-17 00:00:00]
    bounds: [[1981-01-01 00:00:00, 2010-12-01 00:00:00]]
    shape: (1,)  bounds(1, 2)
    dtype: int32
    standard_name: 'time'
    long_name: 'time'
    var_name: 'time'
DimCoord :  latitude / (degrees)
    points: [ 90.,  89., ..., -89., -90.]
    shape: (181,)
...

We would like to plot the data to see what it looks like, so will extract the values we need:

lons = hist_temp.coord('longitude').points
lats = hist_temp.coord('latitude').points
hist_temp_data = hist_temp.data
print(lons.shape)
print(lats.shape)
print(hist_temp_data.shape)
(360,)
(181,)
(181, 360)

We have 360 longitude values, 181 latitude values and 360*181 temperature values.

There are various ways to plot 2d data, and we will use the pcolormesh function to plot the temperature data:

plt.pcolormesh(lons, lats, hist_temp_data, cmap='coolwarm')
plt.colorbar()

Temperature Plot 1

Using Cartopy to plot data on a map

Now we have loaded some geospatial data, we can use the Cartopy package to plot the data on a map.

Cartopy can plot data in various projections, we will create some axes for our plot using the PlateCarree, and add the costlines to the plot:

import cartopy

map_projection = cartopy.crs.PlateCarree()

map_axes = plt.axes(projection=map_projection)
map_axes.add_feature(cartopy.feature.COASTLINE)
map_axes.gridlines(draw_labels=True)

There are several other basic features which can be added to a map using Cartopy. These include:

Cartopy Plot 1

We have added gridlines to the map with map_ax.gridlines() function.

Next we will add our temperature data to the map. When working with Matplotlib axes as we are here, the plotting functions become a method of the axes, so rather than plt.pcolormesh(), we will use map_axes.pcolormesh():

# Set map projection:
map_projection = cartopy.crs.PlateCarree()
# Create the plot axes:
map_axes = plt.axes(projection=map_projection)
# Add gridlines to the map:
map_axes.gridlines(draw_labels=True)
# Plot the temperature data:
temp_plot = map_axes.pcolormesh(lons, lats, hist_temp_data, cmap='coolwarm')
# Add coastlines to the map:
map_axes.add_feature(cartopy.feature.COASTLINE)
# Add a colour scale:
cbar = plt.colorbar(temp_plot, orientation='horizontal', fraction=0.05)
# Set the colour bar label:
cbar.set_label('temperature (K)')
# Set the plot title:
map_axes.set_title('Temperature 1981-2010')

Temperature Plot 2

There is now quite a lot going on to create the plot, and we can see how adding comments helps to keep track of what is being done.

Comparing historical data with recent annual data

We have annual temperature data for the years 2018-2022 in the file data/era5_mean_annual_temp_2018-2022.nc which we would like to complare with the historical data. We can load this data using iris:

ann_temp = iris.load_cube('data/era5_mean_annual_temp_2018-2022.nc')
print(ann_temp)
2 metre temperature / (K)           (time: 5; latitude: 181; longitude: 360)
    Dimension coordinates:
        time                             x            -               -
        latitude                         -            x               -
        longitude                        -            -               x
    Scalar coordinates:
        expver                      1
    Cell methods:
        0                           time: mean
    Attributes:
        Conventions                 'CF-1.7'
        history                     '2023-12-10 09:47:57 GMT by grib_to_netcdf-2.25.1: /opt/ecmwf/mars-client/bin/grib_to_netcdf.bin ...'
DimCoord :  time / (hours since 1900-01-01 00:00:00.0, standard calendar)
    points: [
        2018-06-17 00:00:00, 2019-06-17 00:00:00, 2020-06-16 12:00:00,
        2021-06-17 00:00:00, 2022-06-17 00:00:00]
    bounds: [
        [2018-01-01 00:00:00, 2018-12-01 00:00:00],
        [2019-01-01 00:00:00, 2019-12-01 00:00:00],
        [2020-01-01 00:00:00, 2020-12-01 00:00:00],
        [2021-01-01 00:00:00, 2021-12-01 00:00:00],
        [2022-01-01 00:00:00, 2022-12-01 00:00:00]]
    shape: (5,)  bounds(5, 2)
    dtype: int32
    standard_name: 'time'
    long_name: 'time'
    var_name: 'time'

We can see that this annual data has an additional time dimension, i.e. there are 360*181 gridded temperature values at each time step.

print('shape of annual temp data:', ann_temp.data.shape)
print('shape of data for first time step:', ann_temp.data[0].shape)
shape of annual temp data: (5, 181, 360)
shape of data for first time step: (181, 360)

We would like to compare the temperature for each year in the ann_temp data to the historical data.

To avoided having to repeat code, we will create a function to plot the data:

def plot_temp_diff(ann_data, hist_data, year):
    """
    plot the difference between the annual data and the historical data
    """
    # Calculate the temperature difference:
    temp_diff = ann_data - hist_data
    # Set map projection:
    map_projection = cartopy.crs.PlateCarree()
    # Create the plot axes:
    map_axes = plt.axes(projection=map_projection)
    # Add gridlines to the map:
    map_axes.gridlines(draw_labels=True)
    # Plot the temperature data:
    temp_plot = map_axes.pcolormesh(lons, lats, temp_diff, cmap='coolwarm')
    # Add coastlines to the map:
    map_axes.add_feature(cartopy.feature.COASTLINE)
    # Add a colour scale:
    cbar = plt.colorbar(temp_plot, orientation='horizontal', fraction=0.05)
    # Set the colour bar label:
    cbar.set_label('temperature difference')
    # Set the plot title:
    map_axes.set_title(f'Temperature difference {year}, 1981-2010')
    # Display the plot:
    plt.show()

Once we have created the function, we can use this to plot the difference in the temperature data for a single year.

The data for 2018 is the first time step in the annual data, so to plot the difference between the historical data and the data for 2018:

plot_temp_diff(ann_temp.data[0], hist_temp_data, 2018)

Temperature Plot 3

To plot the differences for all years, we can loop through the data.

We can do this using the built in enumerate function, which loops through a collection of items, and at each step of the loop provides the index and value, for example:

fruits = ['apples', 'bananas', 'raspberries']
colours = ['green', 'yellow', 'red']

for index, fruit in enumerate(fruits):
    colour = colours[index]
    print(fruit, 'are', colour)
apples are green
bananas are yellow
raspberries are red

We will define a range of years for which we have data, and use the enumerate function to loop through this list, and plot the corresponding data.

# get a range of years for which we have data, 2018 to 2022:
years = range(2018, 2023)

# loop through the years using enumerate:
for index, year in enumerate(years):
    # plot the difference between the historical data and the annual data
    # for this year:
    plot_temp_diff(ann_temp.data[index], hist_temp_data, year)

Setting colour bounds

It is difficult to compare the plots, as the colour bounds have automatically been set based on the data values, and are different for each plot.

The colour bounds for the pcolormesh plot can with vmin and vmax arguments. Update the plot_temp_diff function, setting suitable values, and recreate the plots.

Solution

Suitable bounds for the colour values may be -4 to 4, and the pcolormesh line in the function could be updated to:

temp_plot = map_axes.pcolormesh(lons, lats, temp_diff, cmap='coolwarm', vmin=-4, vmax=4)

This should produce plots which all have the same colour bounds Temperature Plot 4

Different plot types

Different plot types may be suitable for different types of data. For our temperature data, we could try creating a filled contour plot, using the contourf function.

Try updating the plot_temp_diff function to use the contourf function, rather than pcolormesh.

Rather than vmin and vmax, the colour bounds for a contour plot are set with the levels argument. For example levels=np.arange(-4, 5, 1)

Solution

To create filled contour plots, the pcolormesh line in the function could be replaced with:

temp_plot = map_axes.contourf(lons, lats, temp_diff, cmap='coolwarm', levels=np.arange(-4, 5, 1))

Temperature Plot 5

Using a different map projection

How would you update the plot_temp_diff function to use the Orthographic projection?

You will need to add the following argument to the pcolormesh() or contourf() command, so that the data points are correctly projected:

transform=cartopy.crs.PlateCarree()

Solution

The setting of the map_projection variable can be updated:

map_projection = cartopy.crs.Orthographic()

Then the plotting transform argument can be added to the plotting function, which is pcolormesh in this example, but the same could be added to contourf

temp_plot = map_axes.pcolormesh(lons, lats, temp_diff, cmap='coolwarm', vmin=-4, vmax=4, transform=cartopy.crs.PlateCarree())

Temperature Plot 6

Key Points

  • Iris reads data in to an Iris Cube object

  • Cartopy can be used to plot data on a map


Programming Style

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • How can I make my programs more readable?

  • How do most programmers format their code?

  • How can programs check their own operation?

Objectives
  • Provide sound justifications for basic rules of coding style.

  • Refactor one-page programs to make them more readable and justify the changes.

  • Use Python community coding standards (PEP-8).

Coding style

A consistent coding style helps others (including our future selves) read and understand code more easily. Code is read much more often than it is written, and as the Zen of Python states, “Readability counts”. Python proposed a standard style through one of its first Python Enhancement Proposals (PEP), PEP8.

Some points worth highlighting:

Follow standard Python style in your code.

Use assertions to check for internal errors.

Assertions are a simple but powerful method for making sure that the context in which your code is executing is as you expect.

def calc_bulk_density(mass, volume):
    '''Return dry bulk density = powder mass / powder volume.'''
    assert volume > 0
    return mass / volume

If the assertion is False, the Python interpreter raises an AssertionError runtime exception. The source code for the expression that failed will be displayed as part of the error message. To ignore assertions in your code run the interpreter with the ‘-O’ (optimize) switch. Assertions should contain only simple checks and never change the state of the program. For example, an assertion should never contain an assignment.

Use docstrings to provide builtin help.

If the first thing in a function is a character string that is not assigned directly to a variable, Python attaches it to the function, accessible via the builtin help function. This string that provides documentation is also known as a docstring.

def average(values):
    "Return average of values, or None if no values are supplied."

    if len(values) == 0:
        return None
    return sum(values) / len(values)

help(average)
Help on function average in module __main__:

average(values)
    Return average of values, or None if no values are supplied.

Multiline Strings

Often use multiline strings for documentation. These start and end with three quote characters (either single or double) and end with three matching characters.

"""This string spans
multiple lines.

Blank lines are allowed."""

Document This

Turn the comment in the following function into a docstring and check that help displays it properly.

def middle(a, b, c):
    # Return the middle value of three.
    # Assumes the values can actually be compared.
    values = [a, b, c]
    values.sort()
    return values[1]

Solution

def middle(a, b, c):
    """Return the middle value of three.
    Assumes the values can actually be compared."""
    values = [a, b, c]
    values.sort()
    return values[1]

Key Points

  • Follow standard Python style in your code.

  • Use docstrings to provide builtin help.


Wrap-Up

Overview

Teaching: 10 min
Exercises: 0 min
Questions
  • What have we learned?

  • What else is out there and where do I find it?

Objectives
  • Name and locate scientific Python community sites for software, workshops, and help.

Leslie Lamport once said, “Writing is nature’s way of showing you how sloppy your thinking is.” The same is true of programming: many things that seem obvious when we’re thinking about them turn out to be anything but when we have to explain them precisely.

Different ways to interact with Python

We have been interacting in notebooks, using Jupyter Lab. There are various other ways to interact with Python.

Notebooks

Notebooks can be accessed via the Jupyter Lab interface, or the ‘classic’ notebook interface can be launched with:

jupyter notebook

From the Jupyter Lab notebook interface, a notebook can be saved to a .py text file by selecting File > Save and Export Notebook As > Executable Script.

From the classic notebook interface, select File > Download as > Python (.py).

Command Line

From a terminal / Anaconda prompt window, the Python interpeter can be accessed by running python. Commands can be entered at the prompt and results can be printed to the terminal output.

From a terminal / Anaconda prompt window, .py files can run with:

python name_of_file.py

Spyder

Spyder is a popular graphical development environment for working with Python, and has similar features to R Studio and Matlab.

When installing the full version of Anaconda, Spyder is included in the installation, and a shortcut to launch the software in the Windows Start Menu. The program can also be launched from a terminal / Anaconda prompt, by running:

spyder

To install spyder using the conda command:

conda install -c conda-forge spyder

Python supports a large and diverse community across academia and industry.

Key Points

  • Python supports a large and diverse community across academia and industry.