You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: lectures/numba.md
+40-71Lines changed: 40 additions & 71 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -48,11 +48,13 @@ import matplotlib.pyplot as plt
48
48
In an {doc}`earlier lecture <need_for_speed>` we discussed vectorization,
49
49
which can improve execution speed by sending array processing operations in batch to efficient low-level code.
50
50
51
-
However, as {ref}`discussed previously <numba-p_c_vectorization>`, traditional vectorization schemes, such as those found in MATLAB, Julia, and NumPy, have several weaknesses.
51
+
However, as {ref}`discussed in that lecture <numba-p_c_vectorization>`,
52
+
traditional vectorization schemes, such as those found in MATLAB, Julia, and NumPy, have weaknesses.
52
53
53
-
For example, they can be highly memory-intensive and, for some algorithms, vectorization is ineffective or impossible.
54
+
* Highly memory-intensive for compound array operations
55
+
* Ineffective or impossible for some algorithms.
54
56
55
-
One way around these problems is through[Numba](https://numba.pydata.org/), a
57
+
One way to circumvent these problems is by using[Numba](https://numba.pydata.org/), a
56
58
**just in time (JIT) compiler** for Python that is oriented towards numerical work.
57
59
58
60
Numba compiles functions to native machine code instructions during runtime.
@@ -62,6 +64,14 @@ When it succeeds, Numba will be on par with machine code from low-level language
62
64
In addition, Numba can do other useful tricks, such as {ref}`multithreading` or
63
65
interfacing with GPUs (through `numba.cuda`).
64
66
67
+
Numba's JIT compiler is in many ways similar to the JIT compiler in JULIA
68
+
69
+
The main difference is that it is less ambitious, attempting to compile a smaller subset of the language.
70
+
71
+
Although this might sound like a defficiency, it is in some ways an advantage.
72
+
73
+
Numba is lean, easy to use, and very good at what it does.
74
+
65
75
This lecture introduces the core ideas.
66
76
67
77
(numba_link)=
@@ -74,10 +84,10 @@ This lecture introduces the core ideas.
74
84
(quad_map_eg)=
75
85
### An Example
76
86
77
-
Let's consider a problem that's difficult to vectorize: generating the
78
-
trajectory of a difference equation given an initial condition.
87
+
Let's consider a problem that's difficult to vectorize (i.e., hand off to array
88
+
processing operations).
79
89
80
-
We will take the difference equation to be the quadratic map
90
+
The problem involves generating the trajectory via the quadratic map
81
91
82
92
$$
83
93
x_{t+1} = \alpha x_t (1 - x_t)
@@ -168,22 +178,23 @@ The basic idea is this:
168
178
* Python is very flexible and hence we could call the function qm with many
169
179
types.
170
180
* e.g., `x0` could be a NumPy array or a list, `n` could be an integer or a float, etc.
171
-
* This makes it hard to *pre*-compile the function (i.e., compile before runtime).
172
-
* However, when we do actually call the function, say by running `qm(0.5, 10)`,
173
-
the types of `x0` and `n` become clear.
181
+
* This makes it very difficult to generate efficient machine code *ahead of time* (i.e., before runtime).
182
+
* However, when we do actually *call* the function, say by running `qm(0.5, 10)`,
183
+
the types of `x0` and `n` become clear.
174
184
* Moreover, the types of *other variables* in `qm`*can be inferred once the input types are known*.
175
-
* So the strategy of Numba and other JIT compilers is to wait until this
176
-
moment, and then compile the function.
185
+
* So the strategy of Numba and other JIT compilers is to *wait until the function is called*, and then compile.
177
186
178
-
That's why it is called "just-in-time" compilation.
187
+
That's is called "just-in-time" compilation.
179
188
180
-
Note that, if you make the call `qm(0.5, 10)` and then follow it with `qm(0.9, 20)`, compilation only takes place on the first call.
189
+
Note that, if you make the call `qm(0.5, 10)` and then follow it with `qm(0.9,
190
+
20)`, compilation only takes place on the first call.
181
191
182
192
This is because compiled code is cached and reused as required.
183
193
184
194
This is why, in the code above, `time3` is smaller than `time2`.
185
195
186
196
197
+
187
198
## Decorator Notation
188
199
189
200
In the code above we created a JIT compiled version of `qm` via the call
@@ -226,30 +237,22 @@ with qe.Timer(precision=4):
226
237
227
238
Numba also provides several arguments for decorators to accelerate computation and cache functions -- see [here](https://numba.readthedocs.io/en/stable/user/performance-tips.html).
228
239
240
+
229
241
## Type Inference
230
242
231
243
Successful type inference is a key part of JIT compilation.
232
244
233
-
As you can imagine, inferring types is easier for simple Python objects (e.g., simple scalar data types such as floats and integers).
245
+
As you can imagine, inferring types is easier for simple Python objects (e.g.,
246
+
simple scalar data types such as floats and integers).
234
247
235
248
Numba also plays well with NumPy arrays, which have well-defined types.
236
249
237
250
In an ideal setting, Numba can infer all necessary type information.
238
251
239
-
This allows it to generate native machine code, without having to call the Python runtime environment.
252
+
This allows it to generate efficient native machine code, without having to call the Python runtime environment.
240
253
241
254
When Numba cannot infer all type information, it will raise an error.
242
255
243
-
```{note}
244
-
In older versions of Numba, the `@jit` decorator would silently fall back
245
-
to "object mode" when it could not infer all types, which provided little or
246
-
no speed gain. Current versions of Numba use `nopython` mode by default,
247
-
meaning the compiler insists on full type inference and raises an error if
248
-
it fails. You will often see `@njit` used in other code, which is simply
249
-
an alias for `@jit(nopython=True)`. Since nopython mode is now the default,
250
-
`@jit` and `@njit` are equivalent.
251
-
```
252
-
253
256
For example, in the (artificial) setting below, Numba is unable to determine the type of function `mean` when compiling the function `bootstrap`
254
257
255
258
```{code-cell} ipython3
@@ -297,9 +300,7 @@ Let's add some cautionary notes.
297
300
As we've seen, Numba needs to infer type information on
298
301
all variables to generate fast machine-level instructions.
299
302
300
-
For simple routines, Numba infers types very well.
301
-
302
-
For larger ones, or for routines using external libraries, it can easily fail.
303
+
For large routines or those using external libraries, this process can easily fail.
303
304
304
305
Hence, it's best to focus on speeding up small, time-critical snippets of code.
305
306
@@ -333,32 +334,14 @@ function.
333
334
334
335
When Numba compiles machine code for functions, it treats global variables as constants to ensure type stability.
335
336
336
-
### Caching Compiled Code
337
-
338
-
By default, Numba recompiles functions each time a new Python session starts.
339
-
340
-
To avoid this overhead, you can pass `cache=True` to the decorator:
341
-
342
-
```{code-cell} ipython3
343
-
@jit(cache=True)
344
-
def qm(x0, n):
345
-
x = np.empty(n+1)
346
-
x[0] = x0
347
-
for t in range(n):
348
-
x[t+1] = α * x[t] * (1 - x[t])
349
-
return x
350
-
```
351
-
352
-
This stores the compiled code on disk so that subsequent sessions can skip
353
-
the compilation step.
354
337
355
338
(multithreading)=
356
339
## Multithreaded Loops in Numba
357
340
358
-
In addition to JIT compilation, Numba provides support for parallel computing on CPUs.
341
+
In addition to JIT compilation, Numba provides support for parallel computing on CPUs and GPUs.
359
342
360
-
The key tool for parallelization in Numba is the `prange` function, which tells
361
-
Numba to execute loop iterations in parallel across available CPU cores.
343
+
The key tool for parallelization on CPUs in Numba is the `prange` function, which tells
344
+
Numba to execute loop iterations in parallel across available cores.
362
345
363
346
To illustrate, let's look first at a simple, single-threaded (i.e., non-parallelized) piece of code.
364
347
@@ -418,27 +401,10 @@ Now let's suppose that we have a large population of households and we want to
418
401
know what median wealth will be.
419
402
420
403
This is not easy to solve with pencil and paper, so we will use simulation
421
-
instead.
422
-
423
-
In particular, we will simulate a large number of households and then
424
-
calculate median wealth for this group.
425
-
426
-
Suppose we are interested in the long-run average of this median over time.
427
-
428
-
For the specification that we've chosen above, we can
429
-
calculate this by taking a one-period cross-sectional snapshot of median
430
-
wealth of the group at the end of a long simulation.
431
-
432
-
Moreover, provided the simulation period is long enough, initial conditions don't matter.
433
-
434
-
(This is due to [ergodicity](https://python.quantecon.org/finite_markov.html#id15).)
435
-
436
-
So, in summary, we are going to simulate 50,000 households by
437
-
438
-
1. arbitrarily setting initial wealth to 1 and
439
-
1. simulating forward in time for 1,000 periods.
404
+
instead:
440
405
441
-
Then we'll calculate median wealth at the end period.
406
+
1. Simulate a large number of households forward in time
407
+
2. Calculate median wealth
442
408
443
409
Here's the code:
444
410
@@ -492,6 +458,8 @@ with qe.Timer():
492
458
493
459
The speed-up is significant.
494
460
461
+
Notice that we parallelize across households rather than over time -- updates of
462
+
an individual household across time periods are inherently sequential
495
463
496
464
## Exercises
497
465
@@ -550,8 +518,9 @@ So we get a speed gain of 2 orders of magnitude by adding four characters.
550
518
:label: speed_ex2
551
519
```
552
520
553
-
In the [Introduction to Quantitative Economics with Python](https://intro.quantecon.org/intro.html) lecture series you can
554
-
learn all about finite-state Markov chains.
521
+
In the [Introduction to Quantitative Economics with
522
+
Python](https://intro.quantecon.org/intro.html) lecture series you can learn all
523
+
about finite-state Markov chains.
555
524
556
525
For now, let's just concentrate on simulating a very simple example of such a chain.
0 commit comments