-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathnumpy_features.qmd
More file actions
223 lines (135 loc) · 5.78 KB
/
numpy_features.qmd
File metadata and controls
223 lines (135 loc) · 5.78 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
---
filters:
- pyodide
---
# Handy Numpy Features
:::{.callout-tip}
Make sure you run the code cell below before any of the others. This will import the numpy package, and it will be remembered until you move to a different page or refresh this page.
:::
```{pyodide-python}
import numpy as np
```
## Empty arrays
{{< video https://youtu.be/OELia4VYA4M >}}
To create a new empty NumPy array, we can use the empty() function and specify the shape (length of each dimension) of the array we want.
```{pyodide-python}
my_np_array = np.empty([5,4,3])
print(my_np_array)
```
:::{.callout-note}
“But wait!” you exclaim, “That doesn’t look empty to me!”. The way it actually works is it creates a new array and dumps in some garbage values that are very close to 0 (note the e to the minus at the end of the numbers) based on stuff that’s in memory at the time.
Just think of it as empty. If you actually want 0s, we use something else (which you’ll see in a moment). But empty() is a little more computationally efficient.
:::
## Creating an Array of Evenly-spaced Values
{{< video https://youtu.be/9SCmVX-qKYM >}}
NumPy has a neat function called arange() which will create a NumPy array with evenly spaced values at an interval of our choosing.
Example : let’s say we want to create a NumPy array with first value 0 and counting up in intervals of 5 up to 100.

```{pyodide-python}
fives_array = np.arange(0, 101, 5)
print(fives_array)
```
## Creating an Array of Zeros
{{< video https://youtu.be/_Zs6UBUa7Dg >}}
numpy.zeros() is a function that creates a NumPy array of given dimensions, filled with zeros. This can be useful if we want to create a placeholder array that we will fill in / update with data as we go.
```{pyodide-python}
zeros_array = np.zeros([5,4,3])
print(zeros_array)
```
eg an array to hold data for five departments, each with 4 groups, each with 3 people.
## Shape and ndim
{{< video https://youtu.be/9YY18yYJhaQ >}}
The shape attribute stores the length (number of elements) in each dimension.
```{pyodide-python}
zeros_array = np.zeros([5,4,3])
print(zeros_array.shape)
```
The ndim attribute stores the number of dimensions in the array.
```{pyodide-python}
zeros_array = np.zeros([5,4,3])
print(zeros_array.ndim)
```
## Mathematical Operations
{{< video https://youtu.be/oE2YxLyv7_Y >}}
One of the key advantages of NumPy arrays is that you can apply mathematical operations at scale to large numbers of values much more efficiently than you would be able to otherwise.
It’s also really easy to do! Let’s imagine we have an array, and we want to double every single value in it.
```{pyodide-python}
three_dim_array = np.array([[[1,2,3,4,5],[6,7,8,9,10]],
[[2,4,6,8,10],[12,14,16,18,20]]])
print("Original Array")
print(three_dim_array)
doubled = three_dim_array * 2
print("Updated Array")
print(doubled)
```
We can also combine mathematical operations with slicing to only apply our operation to certain bits of the array.
```{pyodide-python}
three_dim_array = np.array([[[1,2,3,4,5],[6,7,8,9,10]],
[[2,4,6,8,10],[12,14,16,18,20]]])
print("Original Array")
print(three_dim_array)
three_dim_array[:,1] = three_dim_array[:,1] * 2
print("Updated Array")
print(three_dim_array)
```
This says “for each of the second lists in both groups, replace them with the values doubled”.
We could grab these out as a separate array if we wanted.
## Statistical Operations
{{< video https://youtu.be/NGckVgz8fek >}}
We can also perform statistical operations on NumPy arrays easily and efficiently. We can find a single mean value across the whole array (the same principle works for slices) :
```{pyodide-python}
a = np.array([1,2,3,4,5])
print(a.mean())
```
```{pyodide-python}
b = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(b.mean())
```
Or find means across dimensions of the array :
```{pyodide-python}
b = np.array([[1,2,3,4,5],[6,7,8,9,10]])
print(f"Row Means: {b.mean(axis=0)}")
print(f"Column Means: {b.mean(axis=1)}")
```
axis = 0 means give the mean across the rows - ie the two lists. So, the mean of element 0 in both, mean of element 1 in both etc
axis = 1 means give the mean across the columns - ie the mean of each row (list). So, the mean of list 0, the mean of list 1 etc
Axis values of 2+ are used for third dimensional columns+ (don’t worry if your head’s starting to hurt, that’s normal!)
## Dot Product
{{< video https://youtu.be/7x3yg7xTQC8 >}}
The dot product of two arrays of identical length multiplies the nth element of array a with the nth element of array b, and adds all of these multiplications together to give a single answer.
Example :
```
a = [1, 2, 3]
b = [2, 4, 6]
a · b = (1 x 2) + (2 x 4) + (3 x 6) = 2 + 8 + 18 = 28
```
To do this in NumPy, we use the dot() function.
```{python-pyodide}
a = [1, 2, 3]
b = [2, 4, 6]
dp = np.dot(a,b)
print(dp)
```
:::{.callout-tip}
Dot Product calculations are useful when we want to weight one set of values by another set of values.
For example, in a geographic model, we may want to weight travel times by the number of people coming from a location (so that we don’t treat 1 person having to travel a longer distance the same way as 100 people having to do this).
:::
## Removing Duplicate Data
{{< video https://youtu.be/fHQROx3LUDU >}}
The unique() function allows us to easily remove duplicate values from a NumPy array.
```{python-pyodide}
c = np.array([[1,2,3,4,5], [5,6,7,8,9]])
c_unique = np.unique(c)
print(c_unique)
```
We can also use it to remove duplicate rows or columns.
```{python-pyodide}
d = np.array([[1,2,3], [4,5,6],[1,2,3]])
d_unique = np.unique(d, axis=0)
print(d_unique)
```
```{python-pyodide}
e = np.array([[1,2,1], [2,4,2],[3,5,3]])
e_unique = np.unique(e, axis=0)
print(e_unique)
```