Working with NumPy

Overview

Teaching: 10 min
Exercises: 10 min
Questions
  • What is NumPy and how do I use it?

Objectives
  • Import the NumPy library.

  • Create a NumPy array.

  • Apply functions to NumPy arrays.

Working with the NumPy library

import numpy as np

primes = np.array([2, 3, 5, 7, 11])
print(primes)
[2 3 5 7 11]

The NumPy array looks similar to a list, but let’s take a closer look:

print(type(primes))
print(len(primes))
print(primes.shape)
print(primes.dtype)
<class 'numpy.ndarray'>
5
(5,)
int64

Array functions

NumPy provides many functions, including its own versions of min and max:

print(np.min(primes))
print(np.max(primes))
print(np.mean(primes))
2
11
5.6

A NumPy array will have many methods available, including min, max and mean:

print(primes.min())
print(primes.max())
print(primes.mean())
2
11
5.6

NumPy functions can operate on all elements in an array. For example, what happens if we try to run the math.sin function on multiple items?

import math

sequence = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
math.sin(sequence)
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
/tmp/ipykernel_76280/1284448365.py in <module>
      2 
      3 sequence = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
----> 4 math.sin(sequence)

TypeError: must be real number, not list

The math.sin function can only process a single value.

The NumPy sin function can process multiple values:

sequence = [0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
np.sin(sequence)
array([ 0.        ,  0.84147098,  0.90929743,  0.14112001, -0.7568025 ,
       -0.95892427, -0.2794155 ,  0.6569866 ,  0.98935825,  0.41211849])

The sequence list is converted to a NumPy ndarray during this process.

Multi dimensional arrays

NumPy arrays can have multiple dimensions:

values = np.array([[0, 7, 2], [4, 4, 5]])
print(values)
print(values.shape)
[[0 7 2]
 [4 4 5]]
(2, 3)

The values array is two dimensional, with 2 rows and 3 columns.

Values in NumPy arrays with multiple dimension have multiple indexes. The index of the value 5 in the array is [1, 2]. The row or y index comes first, followed by the column or x index:

print(values[1, 2]) 
5

Finding the median value

If we can find the mean value of the values array with:

print(values.mean())

Can we find the median value in a similar way? If not, is there another way to find the median value?

Solution

print(np.median(values)) 
4.0

The ndarray type does not have a median method, so values.median() does not work. However, the numpy library does include the median function, which can be applied to an array.

Applying functions along an axis

What is the difference between these commands and the results they return?

print(values.max())
print(values.max(axis=0))
print(values.max(axis=1))

Solution

7
[4 7 5]
[7 5]

The first command returns the maximum value from the whole array. The second command returns the maximum value from each column (axis=0). The third command returns the maximum value from each row (axis=1).

Data types

What is the data type of the values array, and how could the array be created with a different data type, e.g. np.float32?

Solution

values = np.array([[0, 7, 2], [4, 4, 5]])
print(values.dtype)
print(values)

values = np.array([[0, 7, 2], [4, 4, 5]], dtype=np.float32)
print(values.dtype)
print(values)
int64
[[0 7 2]
 [4 4 5]]
float32
[[0. 7. 2.]
 [4. 4. 5.]]

The dtype argument can be used to specify the data type when creating a NumPy array.

NaN values

If we create an array containing a NaN (not a number) value, how do we find the maximum value?

results = np.array([0.3, 7.2, np.nan, 4.5, 9.7])

Solution

print(results.max())
print(np.nanmax(results))
nan
9.7

NumPy includes functions, such as nanmax, which will ignore any NaN values in the input.

Key Points

  • NumPy provides many funtions for working with numerical data.

  • The NumPy ndarray can be used to store numerical data with multiple dimensions.

  • The NumPy functions enable efficient processing of values in a ndarray.