Numpy data structures
Contents
Numpy data structures¶
When we looked at python data structures, it was obvious that the only way to deal with arrays of values (matrices / vectors etc) would be via lists and lists of lists.
This is slow and inefficient in both execution and writing code.
numpy
attempts to fix this.
import numpy as np
## This is a list of everything in the module
np.__all__
an_array = np.array([0,1,2,3,4,5,6], dtype=np.float)
print (an_array)
print( type(an_array))
help(an_array)
print(an_array.dtype)
print(an_array.shape)
A = np.zeros((4,4))
print (A)
print (A.shape)
print (A.diagonal())
A[0,0] = 2.0
print (A)
np.fill_diagonal(A, 1.0)
print (A)
B = A.diagonal()
B[0] = 2.0
for i in range(0,A.shape[0]):
A[i,i] = 1.0
print (A)
print()
A[:,2] = 2.0
print (A)
print()
A[2,:] = 4.0
print (A)
print()
print (A.T)
print()
A[...] = 0.0
print (A)
print()
for i in range(0,A.shape[0]):
A[i,:] = float(i)
print (A)
print()
for i in range(0,A.shape[0]):
A[i,:] = i
print (A)
print()
print (A[::2,::2] )
print (A[::-1,::-1])
Speed¶
%%timeit
B = np.zeros((1000,1000))
for i in range(0,1000):
for j in range(0,1000):
B[i,j] = 2.0
%%timeit
B = np.zeros((1000,1000))
B[:,:] = 2.0
%%timeit
B = np.zeros((1000,1000))
B[...] = 2.0
Views of the data (are free)¶
It costs very little to look at data in a different way (e.g. to view a 2D array as a 1D vector).
Making a copy is a different story
print( A.reshape((2,8)))
print()
print( A.reshape((-1)))
print( A.ravel())
print()
print( A.reshape((1,-1)))
print()
%%timeit
A.reshape((1,-1))
%%timeit
elements = A.shape[0]*A.shape[1]
B = np.empty(elements)
B[...] = A[:,:].reshape(elements)
%%timeit
elements = A.shape[0]*A.shape[1]
B = np.empty(elements)
for i in range(0,A.shape[0]):
for j in range(0,A.shape[1]):
B[i+j*A.shape[1]] = A[i,j]
Exercise: Try this again for a 10000x10000 array
Indexing / broadcasting¶
In numpy, we can index an array by explicitly specifying elements, by specifying slices, by supplying lists of indices (or arrays), we can also supply a boolean array of the same shape as the original array which will select / return an array of all those entries where True
applies.
Although some of these might seem difficult to use, they are often the result of other numpy operations. For example np.where
converts a truth array to a list of indices.
AA = np.zeros((100,100))
AA[10,11] = 1.0
AA[99,1] = 2.0
cond = np.where(AA >= 1.0)
print (cond)
print (AA[cond])
print (AA[ AA >= 1])
print(AA>=1.0)
Broadcasting is a way of looping on arrays which have “compatible” but unequal sizes.
For example, the element-wise multiplication of 2 arrays
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
print a * b
has an equivalent:
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0])
print a * b
or
a = np.array([1.0, 2.0, 3.0])
b = 2.0
print a * b
in which the “appropriate” interpretation of b
is made in each case to achieve the result.
a = np.array([1.0, 2.0, 3.0])
b = np.array([2.0, 2.0, 2.0])
print( a * b)
b = np.array([2.0])
print (a * b)
print (a * 2.0)
Arrays are compatible as long as each of their dimensions (shape) is either equal to the other or 1.
Thus, above, the multplication works when a.shape
is (1,3)
and b.shape
is either (1,3)
or (1,1)
(Actually, these are (3,)
and (1,)
in the examples above …
print (a.shape)
print (b.shape)
print ((a*b).shape)
print ((a+b).shape)
aa = a.reshape(1,3)
bb = b.reshape(1,1)
print (aa.shape)
print (bb.shape)
print ((aa*bb).shape)
print ((aa+bb).shape)
In multiple dimensions, the rule applies but, perhaps, is less immediately intuitive:
a = np.array([[ 0.0, 0.0, 0.0],
[10.0,10.0,10.0],
[20.0,20.0,20.0],
[30.0,30.0,30.0]])
b = np.array([[1.0,2.0,3.0]])
print (a + b)
print ()
print (a.shape)
print (b.shape)
print ((a+b).shape)
Note that this also works for
a = np.array([[ 0.0, 0.0, 0.0],
[10.0,10.0,10.0],
[20.0,20.0,20.0],
[30.0,30.0,30.0]])
b = np.array([1.0,2.0,3.0])
print (a + b)
print ()
print (a.shape)
print (b.shape)
print ((a+b).shape)
But not for
a = np.array([[ 0.0, 0.0, 0.0],
[10.0,10.0,10.0],
[20.0,20.0,20.0],
[30.0,30.0,30.0]])
b = np.array([[1.0],[2.0],[3.0]])
print (a.shape)
print (b.shape)
print ((a+b).shape)
Vector Operations¶
X = np.arange(0.0, 2.0*np.pi, 0.0001)
print (X)
import math
math.sin(X)
np.sin(X)
S = np.sin(X)
C = np.cos(X)
S2 = S**2 + C**2
print (S2)
print (S2 - 1.0)
test = np.isclose(S2,1.0)
print (test)
print (np.where(test == False))
print (np.where(S2 == 0.0))
Exercise: find out how long it takes to compute the sin, sqrt, power of a 1000000 length vector (array). How does this speed compare to using the normal math
functions element by element in the array ? What happens if X is actually a complex array ?
Hints: you might find it useful to know about:
np.linspace
vnp.arange
np.empty_like
ornp.zeros_like
the python
enumerate
functionhow to write a table in markdown
description |
time |
notes |
---|---|---|
|
? |
|
|
? |
|
? |
- |
X = np.linspace(0.0, 2.0*np.pi, 10000000)
print (X.shape)
# ...
%%timeit
S = np.sin(X)
%%timeit
S = np.empty_like(X)
for i, x in enumerate(X):
S[i] = math.sin(x)
XX = np.ones((100,100,100))
XX.shape
%%timeit
np.sin(XX)
%%timeit
for i in range(0,100):
for j in range(0,100):
for k in range(0,100):
math.sin(XX[i,j,k])
X = np.linspace(0.0, 2.0*np.pi, 10000000)
Xj = X + 1.0j
print (Xj.shape, Xj.dtype)
%%timeit
Sj = np.sin(Xj)
import cmath
%%timeit
Sj = np.empty_like(Xj)
for i, x in enumerate(Xj):
Sj[i] = cmath.sin(x)
Exercise: look through the functions below from numpy, choose 3 of them and work out how to use them on arrays of data. Write a few lines to explain what you find.
np.max
v.np.argmax
np.where
np.logical_and
np.fill_diagonal
np.count_nonzero
np.isinf
andnp.isnan
Here is an example:
np.concatenate
takes a number of arrays and glues them together. For 1D arrays this is simple:
A = np.array([1.0,1.0,1.0,1.0])
B = np.array([2.0,2.0,2.0,2.0])
C = np.array([3.0,3.0,3.0,3.0])
R = np.concatenate((A,B,C))
# array([ 1., 1., 1., 1., 2., 2., 2., 2., 3., 3., 3., 3.])
an equivalent statement is np.hstack((A,B,C))
but note the difference with np.vstack((A,B,C))
With higher dimensional arrays, the gluing takes place along one axis
:
A = np.array(([1.0,1.0,1.0,1.0],[2.0,2.0,2.0,2.0]))
B = np.array(([3.0,3.0,3.0,3.0],[4.0,4.0,4.0,4.0]))
C = np.array(([5.0,5.0,5.0,5.0],[6.0,6.0,6.0,6.0]))
R = np.concatenate((A,B,C))
print R
print
R = np.concatenate((A,B,C), axis=1)
print R
# Test the results here
A = np.array(([1.0,1.0,1.0,1.0],[2.0,2.0,2.0,2.0]))
B = np.array(([3.0,3.0,3.0,3.0],[4.0,4.0,4.0,4.0]))
C = np.array(([5.0,5.0,5.0,5.0],[6.0,6.0,6.0,6.0]))
R = np.concatenate((A,B,C))
print R
print
R = np.concatenate((A,B,C), axis=1)
print R