-
Notifications
You must be signed in to change notification settings - Fork 28
Expand file tree
/
Copy pathpython.qmd
More file actions
386 lines (304 loc) · 11 KB
/
python.qmd
File metadata and controls
386 lines (304 loc) · 11 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
# Python Refreshment {#sec-python}
You have programmed in Python. Regardless of your skill level, let us
do some refreshing.
## The Python World
+ Function: a block of organized, reusable code to complete certain
task.
+ Module: a file containing a collection of functions, variables, and
statements.
+ Package: a structured directory containing collections of modules
and an `__init.py__` file by which the directory is interpreted as a
package.
+ Library: a collection of related functionality of codes. It is a
reusable chunk of code that we can use by importing it in our
program, we can just use it by importing that library and calling
the method of that library with period(.).
See, for example, [how to build a Python
libratry](https://medium.com/analytics-vidhya/how-to-create-a-python-library-7d5aea80cc3f).
## Standard Library
Python’s has an extensive standard library that offers a wide range of
facilities as indicated by the long table of contents listed below.
See documentation [online](https://docs.python.org/3/library/).
> The library contains built-in modules (written in C) that provide access to
> system functionality such as file I/O that would otherwise be inaccessible to
> Python programmers, as well as modules written in Python that provide
> standardized solutions for many problems that occur in everyday
> programming. Some of these modules are explicitly designed to encourage and
> enhance the portability of Python programs by abstracting away
> platform-specifics into platform-neutral APIs.
Question: How to get the constant $e$ to an arbitary precision?
The constant is only represented by a given double precision.
```{python}
import math
print("%0.20f" % math.e)
print("%0.80f" % math.e)
```
Now use package `decimal` to export with an arbitary precision.
```{python}
import decimal # for what?
## set the required number digits to 150
decimal.getcontext().prec = 150
decimal.Decimal(1).exp().to_eng_string()
decimal.Decimal(1).exp().to_eng_string()[2:]
```
## Important Libraries
+ NumPy
+ pandas
+ matplotlib
+ IPython/Jupyter
+ SciPy
+ scikit-learn
+ statsmodels
Question: how to draw a random sample from a normal distribution and
evaluate the density and distributions at these points?
```{python}
from scipy.stats import norm
mu, sigma = 2, 4
mean, var, skew, kurt = norm.stats(mu, sigma, moments='mvsk')
print(mean, var, skew, kurt)
x = norm.rvs(loc = mu, scale = sigma, size = 10)
x
```
The pdf and cdf can be evaluated:
```{python}
norm.pdf(x, loc = mu, scale = sigma)
```
## Writing a Function
Consider the Fibonacci Sequence
$1, 1, 2, 3, 5, 8, 13, 21, 34, ...$.
The next number is found by adding up the two numbers before it.
We are going to use 3 ways to solve the problems.
The first is a recursive solution.
```{python}
def fib_rs(n):
if (n==1 or n==2):
return 1
else:
return fib_rs(n - 1) + fib_rs(n - 2)
%timeit fib_rs(10)
```
The second uses dynamic programming memoization.
```{python}
def fib_dm_helper(n, mem):
if mem[n] is not None:
return mem[n]
elif (n == 1 or n == 2):
result = 1
else:
result = fib_dm_helper(n - 1, mem) + fib_dm_helper(n - 2, mem)
mem[n] = result
return result
def fib_dm(n):
mem = [None] * (n + 1)
return fib_dm_helper(n, mem)
%timeit fib_dm(10)
```
The third is still dynamic programming but bottom-up.
```{python}
def fib_dbu(n):
mem = [None] * (n + 1)
mem[1] = 1;
mem[2] = 1;
for i in range(3, n + 1):
mem[i] = mem[i - 1] + mem[i - 2]
return mem[n]
%timeit fib_dbu(500)
```
Apparently, the three solutions have very different performance for
larger `n`.
### Monty Hall
Here is a function that performs the Monty Hall experiments.
In this version, the host opens only one empty door.
```{python}
import numpy as np
def montyhall(ndoors, ntrials):
doors = np.arange(1, ndoors + 1) / 10
prize = np.random.choice(doors, size=ntrials)
player = np.random.choice(doors, size=ntrials)
host = np.array([np.random.choice([d for d in doors
if d not in [player[x], prize[x]]])
for x in range(ntrials)])
player2 = np.array([np.random.choice([d for d in doors
if d not in [player[x], host[x]]])
for x in range(ntrials)])
return {'noswitch': np.sum(prize == player), 'switch': np.sum(prize == player2)}
```
Test it out:
```{python}
montyhall(3, 1000)
montyhall(4, 1000)
```
The true value for the two strategies with $n$ doors are, respectively,
$1 / n$ and $\frac{n - 1}{n (n - 2)}$.
In the homework exercise, the host opens $m$ doors that are empty. An
argument `nempty` could be added to the function.
## Variables versus Objects
In Python, variables and the objects they point to actually live in
two different places in the computer memory. Think of variables as
pointers to the objects they’re associated with, rather than being
those objects. This matters when multiple variables point to the same
object.
```{python}
x = [1, 2, 3] # create a list; x points to the list
y = x # y also points to the same list in the memory
y.append(4) # append to y
x # x changed!
```
Now check their addresses
```{python}
print(id(x)) # address of x
print(id(y)) # address of y
```
Nonetheless, some data types in Python are "immutable", meaning that
their values cannot be changed in place. One such example is strings.
```{python}
x = "abc"
y = x
y = "xyz"
x
```
Now check their addresses
```{python}
print(id(x)) # address of x
print(id(y)) # address of y
```
Question: What's mutable and what's immutable?
Anything that is a collection of other objects is mutable, except
``tuples``.
Not all manipulations of mutable objects change the object rather than
create a new object. Sometimes when you do something to a mutable
object, you get back a new object. Manipulations that change an
existing object, rather than create a new one, are referred to as
“in-place mutations” or just “mutations.” So:
+ __All__ manipulations of immutable types create new objects.
+ __Some__ manipulations of mutable types create new objects.
Different variables may all be pointing at the same object is
preserved through function calls (a behavior known as “pass by
object-reference”). So if you pass a list to a function, and that
function manipulates that list using an in-place mutation, that change
will affect any variable that was pointing to that same object outside
the function.
```{python}
x = [1, 2, 3]
y = x
def append_42(input_list):
input_list.append(42)
return input_list
append_42(x)
```
Note that both `x` and `y` have been appended by $42$.
## Number Representation
Numers in a computer's memory are represented by binary styles (on and
off of bits).
### Integers
If not careful, It is easy to be bitten by overflow with integers when
using Numpy and Pandas in Python.
```{python}
import numpy as np
x = np.array(2 ** 63 - 1 , dtype = 'int')
x
# This should be the largest number numpy can display, with
# the default int8 type (64 bits)
```
*Note: on Windows and other platforms, `dtype = 'int'` may have to be changed to `dtype = np.int64` for the code to execute. Source: [Stackoverflow](https://stackoverflow.com/questions/38314118/overflowerror-python-int-too-large-to-convert-to-c-long-on-windows-but-not-ma)*
What if we increment it by 1?
```{python}
y = np.array(x + 1, dtype = 'int')
y
# Because of the overflow, it becomes negative!
```
For vanilla Python, the overflow errors are checked and more digits
are allocated when needed, at the cost of being slow.
```{python}
2 ** 63 * 1000
```
This number is 1000 times larger than the prior number,
but still displayed perfectly without any overflows
### Floating Number
Standard double-precision floating point number uses 64 bits. Among
them, 1 is for sign, 11 is for exponent, and 52 are fraction significand,
See <https://en.wikipedia.org/wiki/Double-precision_floating-point_format>.
The bottom line is that, of course, not every real number is exactly
representable.
If you have played the Game 24, here is a tricky one:
```{python}
8 / (3 - 8 / 3) == 24
```
Surprise?
There are more.
```{python}
0.1 + 0.1 + 0.1 == 0.3
```
```{python}
0.3 - 0.2 == 0.1
```
What is really going on?
```{python}
import decimal
decimal.Decimal(0.1)
```
```{python}
decimal.Decimal(8 / (3 - 8 / 3))
```
Because the mantissa bits are limited, it can not represent a floating point
that's both very big and very precise. Most computers can represent all integers
up to $2^{53}$, after that it starts skipping numbers.
```{python}
2.1 ** 53 + 1 == 2.1 ** 53
# Find a number larger than 2 to the 53rd
```
```{python}
x = 2.1 ** 53
for i in range(1000000):
x = x + 1
x == 2.1 ** 53
```
We add 1 to `x` by 1000000 times, but it still equal to its initial
value, `2.1 ** 53`. This is because this number is too big that computer
can't handle it with precision like add 1.
Machine epsilon is the smallest positive floating-point number `x` such that
`1 + x != 1`.
```{python}
print(np.finfo(float).eps)
print(np.finfo(np.float32).eps)
```
## Virtual Environment {#sec-python-venv}
Virtual environments in Python are essential tools for managing
dependencies and ensuring consistency across projects. They allow
you to create isolated environments for each project, with its
own set of installed packages, separate from the global Python
installation. This isolation prevents conflicts between project
dependencies and versions, making your projects more reliable
and easier to manage. It's particularly useful when working on
multiple projects with differing requirements, or when
collaborating with others who may have different setups.
To set up a virtual environment, you first need to ensure that
Python is installed on your system. Most modern Python
installations come with the venv module, which is used to create
virtual environments. Here's how to set one up:
+ Open your command line interface.
+ Navigate to your project directory.
+ Run `python3 -m venv myenv`, where `myenv` is the name of the
virtual environment to be created. Choose an informative name.
This command creates a new directory named `myenv` (or your chosen
name) in your project directory, containing the virtual
environment.
To start using this environment, you need to activate it. The
activation command varies depending on your operating system:
+ On Windows, run `myenv\Scripts\activate`.
+ On Linux or MacOS, use `source myenv/bin/activate` or
`. myenv/bin/activate`.
Once activated, your command line will typically show the name of
the virtual environment, and you can then install and use packages
within this isolated environment without affecting your global
Python setup.
To exit the virtual environment, simply type `deactivate` in your
command line. This will return you to your system’s global Python
environment.
As an example, let's install a package, like `numpy`, in this
newly created virtual environment:
+ Ensure your virtual environment is activated.
+ Run `pip install numpy`.
This command installs the requests library in your virtual
environment. You can verify the installation by running
`pip list`, which should show requests along with its version.