-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathProgettoDS.Rmd
More file actions
1814 lines (1282 loc) · 96.2 KB
/
ProgettoDS.Rmd
File metadata and controls
1814 lines (1282 loc) · 96.2 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
---
title: "A journey through 71 years of Formula 1"
author: "Andrea Mansi - 137857 - Università degli Studi di Udine"
date: "February 15, 2021"
output: html_document
---
<style>
body {text-align: justify}
</style>
```{r setup, include = TRUE, echo = FALSE, message=FALSE, warning=FALSE}
# CODE CHUNKS MUST BE EXECUTED SEQUENTIALLY
# Portions of comments are in italian, report is in english.
knitr::opts_chunk$set(echo = TRUE)
# true -> knitting of the report, code chunks and tests are not included
# false -> knit all the code and tests + report at the end of the document
knit_report = TRUE
# if false -> gif animations render skipped (they are very time consuming)
render_anim = TRUE
```
```{r, include = !knit_report, warning=FALSE,message=FALSE}
# Loading libraries
library(ggplot2) # for graphs
library(readr) # for csvs
library(tibble) # for better csvs
library(dplyr) # for queries
library(fmsb) # spidercharts
library(viridis) # for colors
library(wordcloud) # worldcloud plots
library(wesanderson) # for colors
library(gridExtra) # for nice layouts
library("rnaturalearth") # world data for ggplot map
library("rnaturalearthdata") # world data for ggplot map
library(reactable) # nice tables
library(gganimate) # for gif renders (ggplot2)
library(visNetwork) # cool interactive networks :)
library(igraph) # network analysis
```
```{r, include = !knit_report,message=FALSE,warning=FALSE}
# Loading csvs data
# NB: errore in riga 4376 dataset driverStand: corretto manualmente
# NB: errore in riga 23765 dataset results: corretto manualmente
# NB: errore dataset driverstandings, convertito int txt e funziona...
circuits <- as_tibble(read_csv("data\\circuits.csv"))
constrRes <- as_tibble(read_csv("data\\constructorResults.csv"))
constr <- as_tibble(read_csv("data\\constructors.csv"))
constrStand <- as_tibble(read_csv("data\\constructorStandings.csv"))
drivers <- as_tibble(read_csv("data\\drivers.csv"))
driverStand <- as_tibble(read_csv("data\\driverStandings.txt"))
lapTimes <- as_tibble(read_csv("data\\lapTimes.csv"))
pits <- as_tibble(read_csv("data\\pitStops.csv"))
quali <- as_tibble(read_csv("data\\qualifying.csv"))
races <- as_tibble(read_csv("data\\races.csv"))
results <- as_tibble(read_csv("data\\results.csv"))
seasons <- as_tibble(read_csv("data\\seasons.csv"))
status <- as_tibble(read_csv("data\\status.csv"))
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# Interrogo i dati per l'estrapolazione di alcune informazioni relative alle nazioni
# --- Pulisco i dataset dalle variabili inutili
circuits = select(circuits,-c(lat,lng,alt,url))
races = select(races,-c(time,url))
# --- Numero totale di circuiti in F1
n_of_circuits = nrow(circuits)
# --- Voglio contare il numero di gare disputate per ciascun circuito
counts <- races %>% count(circuitId) # coppie circuitID - n°gare disputate
circuits <- inner_join(circuits,counts)
circuits <- rename(circuits, held_races=n)
rm(counts)
# --- Voglio contare il numero di gare disputate per ciascuna nazione e il numero di circuiti usati per ciascuna nazione
country_stats <- circuits %>% count(country) %>% rename(circuits_used = n)
counts <- inner_join(circuits,races,by="circuitId") %>% count(country)
country_stats <- inner_join(country_stats,counts,by="country")
country_stats <- rename(country_stats,held_races=n)
rm(counts)
country_stats
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# Interrogo i dati per estrapolare alcune informazioni relative ai piloti (per nazione)
# --- Pulisco il dataset dei piloti
drivers = select(drivers,-c(url))
# --- Voglio sapere quanti piloti ci sono stati per ciascuna nazione (escludo quei pochissimi piloti half nation half another nation)
# PROBLEMA -> la nazionalità nei piloti NON è espressa con il nome della nazione... non c'è scritto "Italy" ma "Italian"..... problema per il join....
# Uso un dataset custom creato ad hoc per l'associazione nazionalità - nazione :(
nat_to_country <- as_tibble(read_csv("data\\nat_to_country_custom_v1.csv"))
# --- conto il numero di piloti per nazionalità
nationalities_stats <- drivers %>% count(nationality) %>% rename(drivers=n)
# associo n° piloti alla nazione eseguendo un join per associare nazionalità e nazione
temp <- inner_join(nat_to_country,nationalities_stats,by="nationality")
# rimuovo colonna nationality in quanto non più utile
country_stats <- full_join(temp,country_stats,by="country") %>% select (-c(nationality))
# cleaning data
rm(nationalities_stats,temp)
# Gli NA diventano 0
country_stats[is.na(country_stats)] <- 0
country_stats
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# Interrogo i dati per il calcolo di statistiche sui piloti
# a ciascun pilota associo la nazione e rimuovo la nazionalità
driver_stats <- drivers %>% select(c(driverId,forename,surname,nationality,driverRef))
driver_stats <- inner_join(driver_stats,nat_to_country) %>% select(-c(nationality))
# n° gare disputate per pilota
held_races <- results %>% count(driverId) %>% rename(races=n)
# n° vittorie per pilota
wins <- results %>% filter(position==1) %>% count(driverId) %>% rename(wins=n)
# n° 2° posto per pilota
second <- results %>% filter(position==2) %>% count(driverId) %>% rename(second=n)
# n° 3° posto per pilota
third <- results %>% filter(position==3) %>% count(driverId) %>% rename(third=n)
# aggiungo i relativi ai podidati alla tabella iniziale, sostituendo NA con 0
stats <- full_join(held_races,wins,by="driverId")
stats <- full_join(stats,second,by="driverId")
stats <- full_join(stats,third,by="driverId")
stats[is.na(stats)] <- 0 # sostituisco NA con 0
driver_stats <- inner_join(driver_stats,stats)
rm(stats,wins,second,third) # cancello dataset temporaneai
# Calcolo totale podi
driver_stats = driver_stats %>% mutate(podiums=wins+second+third)
# Calcolo totale podium rate
driver_stats = driver_stats %>% mutate(podium_rate=podiums/races)
# Calcolo win rate
driver_stats = driver_stats %>% mutate(win_rate=wins/races)
# calcolo del full_name e rimuovo surname e name
driver_stats = driver_stats %>% mutate(full_name = paste(forename,surname)) %>% select(-surname,-forename)
driver_stats
```
```{r,include = !knit_report, warning=FALSE,message=FALSE, echo=!knit_report,fig.width=10, fig.height=6}
# Plot delle informazioni ricevute per i piloti (top 30)
# races held
driver_stats = driver_stats %>% arrange(desc(races))
driver_stats
ggplot(data=driver_stats[1:30,],aes(x= reorder(full_name,-races),y=races)) +
geom_bar(aes(fill=races),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per n° di gare disputate")
# wins plot
driver_stats = driver_stats %>% arrange(desc(wins))
driver_stats
ggplot(data=driver_stats[1:30,],aes(x= reorder(full_name,-wins),y=wins)) +
geom_bar(aes(fill=wins),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per n° vittorie")
# 2nd place plot
driver_stats = driver_stats %>% arrange(desc(second))
driver_stats
ggplot(data=driver_stats[1:30,],aes(x= reorder(full_name,-second),y=second)) +
geom_bar(aes(fill=second),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per 2° posto")
# 3nd place plot
driver_stats = driver_stats %>% arrange(desc(third))
driver_stats
ggplot(data=driver_stats[1:30,],aes(x= reorder(full_name,-third),y=third)) +
geom_bar(aes(fill=third),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per 3° posto")
# totale podi plot
driver_stats = driver_stats %>% arrange(desc(podiums))
driver_stats
ggplot(data=driver_stats[1:30,],aes(x= reorder(full_name,-podiums),y=podiums)) +
geom_bar(aes(fill=podiums),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per n° di podi totali")
# per il plot del podium_rate considero i piloti con almeno 100 gare, altrimenti in cima alla classifica avrei piloti con 1 gara vinta su 1 (ad esempio)... troppo facile :)
driver_stats_with_100_races = driver_stats %>% filter(races>99)
# podium rate for drivers with at least 100 races held
driver_stats_with_100_races = driver_stats_with_100_races %>% arrange(desc(podium_rate))
driver_stats_with_100_races
ggplot(data=driver_stats_with_100_races[1:30,],aes(x= reorder(full_name,-podium_rate),y=podium_rate)) +
geom_bar(aes(fill=podium_rate),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per rapporto podi/gare disputate (con almeno 100 gare disputate)")
# win rate for drivers with at least 100 races held
driver_stats_with_100_races = driver_stats_with_100_races %>% arrange(desc(win_rate))
driver_stats_with_100_races
ggplot(data=driver_stats_with_100_races[1:30,],aes(x= reorder(full_name,-win_rate),y=win_rate)) +
geom_bar(aes(fill=win_rate),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per rapporto vittorie/gare disputate (con almeno 100 gare disputate)")
rm(driver_stats_with_100_races)
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# Analisi del punteggio totalizzato dai piloti
# ottengo solo le informazioni utili: tutti i risultati di posizione <= 10 (dove si totalizzano punti)
results$position = as.numeric(results$position)
points = results %>% filter(position<11.0 && position >0) %>% select(driverId,points,raceId,position)
points_stats = aggregate(points$points,by=list(driverId=points$driverId),FUN=sum) %>% rename(tot_points=x)
# PROBLEMA: CAMPIONATI DIVERSI -> DIVERSI SISTEMI DI PUNTI
# I DATI VANNO NORMALIZZATI PER ESSERE FAIR CON TUTTI I PILOTI DI DIVERSE EPOCHE
# Primo passo: punti per vittoria per ciascun anno (season)
races_year = races %>% select(raceId,year)
points = inner_join(points,races_year,by="raceId")
season_rank_system_data = points %>% filter(position==1) %>% select(year,points)
season_rank_system_data = unique(season_rank_system_data) %>% arrange(desc(year))
# NB: le prime 3 colonne fanno riferimento a 3 casi particolari, CHE VANNO RIMOSSI
# 2014,50 -> introduzione dei punti X2 rimossi subito
# 2019,2020 -> 26 punti (1 bonus extra per fastest lap, verrà analizzato dopo...) rimuovo
season_rank_system_data = season_rank_system_data %>% filter(points!= 50) %>% filter(points!= 26)
# rename colonna points
season_rank_system_data = season_rank_system_data %>% rename(points_per_win=points) %>% group_by(year) %>% top_n(1,points_per_win) #top_n -> tiene il max per ciascun anno
```
```{r,include = !knit_report, warning=FALSE,message=FALSE, echo=!knit_report,fig.width=10, fig.height=6}
ggplot(data=season_rank_system_data[],aes(x= reorder(year,year),y=points_per_win)) +
geom_bar(position = "dodge", aes(fill=points_per_win),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=90, vjust=1, hjust=1,size=9)) +
labs(x="",y="",title="Punti per vittoria - diversi sistemi di punteggio")
```
```{r,include = !knit_report, warning=FALSE,message=FALSE, echo=!knit_report}
# NORMALIZZAZIONE dei dati relativi ai punteggi. Idea: assegnare a ciascun risultato dal 1950 al 2020 i punti del sistema 2020, così da premiare ciascuna posizione
# con lo stesso numero di punti
# funzione posizione -> punti
calculate_points <- function(position){
if(is.na(position)){return(0)}
position = as.numeric(position)
if(position==1){return(25)}
else if(position==2){return(18)}
else if(position==3){return(15)}
else if(position==4){return(12)}
else if(position==5){return(10)}
else if(position==6){return(8)}
else if(position==7){return(6)}
else if(position==8){return(4)}
else if(position==9){return(2)}
else if(position==10){return(1)}
return(0)
}
# applico la funzione di conversione dei punti
points$updated_points = lapply(points$position,calculate_points)
points$updated_points = as.numeric(points$updated_points)
# somma dei punti totali (nuovo sistema) per ciascun driver
points_stats_new = aggregate(points$updated_points,by=list(driverId=points$driverId),FUN=sum) %>% rename(tot_points_new=x)
# aggiungo la colonna a points stats
points_stats = inner_join(points_stats,points_stats_new,by="driverId")
# aggiungo i punteggi a driver stats
driver_stats = inner_join(driver_stats,points_stats,by="driverId")
# elimino le statistiche dei punti, ormai aggiunge nella tabella driverstats
rm(points_stats_new)
rm(points_stats)
```
```{r,include = !knit_report, warning=FALSE,message=FALSE, echo=!knit_report,fig.width=10, fig.height=6}
driver_stats = driver_stats %>% arrange(desc(tot_points))
driver_stats
ggplot(data=driver_stats[1:30,],aes(x= reorder(full_name,-tot_points),y=tot_points)) +
geom_bar(aes(fill=tot_points),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per punti totali")
driver_stats = driver_stats %>% arrange(desc(tot_points_new))
driver_stats
ggplot(data=driver_stats[1:30,],aes(x= reorder(full_name,-tot_points_new),y=tot_points_new)) +
geom_bar(aes(fill=tot_points_new),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per punti totali - stesso sitema di punteggio per ogni stagione")
```
```{r,include = !knit_report, warning=FALSE,message=FALSE,fig.width=10, fig.height=6}
# Calcolo dei potenziali punti massimi raggiungibili nella carriera di un pilota
driver_stats = driver_stats %>% mutate(tot_points_potential = races*25)
# Calcolo rapporto punti_totali potenziale_punti
driver_stats = driver_stats %>% mutate(points_ratio = tot_points_new/tot_points_potential)
# Ai fini del plotting, filtro i piloti che abbiano almeno 25 gare disputate
data = driver_stats %>% filter(races > 25)
data = data %>% arrange(desc(points_ratio))
data
ggplot(data=data[1:30,],aes(x= reorder(full_name,-points_ratio),y=points_ratio)) +
geom_bar(aes(fill=points_ratio),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="% di punti ottenuti sul potenziale totale")
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# Analisi dei costruttori
# pulizia dataset + conversione da nazionalità a nazione
constr = constr %>% select (-url)
constr = left_join(constr,nat_to_country) %>% select(-nationality)
# --- Conto il numero di scuderie per nazione
temp <- constr %>% count(country) %>% rename(constructors=n)
# Aggiungo l'informazione alle statistiche delle nazioni
country_stats <- left_join(country_stats,temp,by="country")
# cleaning data
rm(temp)
# Gli NA diventano 0
country_stats[is.na(country_stats)] <- 0
# --- Conto il numero di campionati vinti per scuderia (introdotti nel 1958)
seasons_constr_winners = inner_join(constrStand,races,by='raceId') %>% select(year,position,points,constructorId,raceId) %>% filter(position==1) %>% group_by(year) %>% top_n(1,points) %>% arrange(desc(year)) %>% group_by(year) %>% top_n(1,raceId) %>% select(year,constructorId)
# count per constr
seasons_constr_winners = seasons_constr_winners %>% group_by(constructorId) %>% count(constructorId) %>% rename(championships=n)
# --- Conto il numero di campionati svolti per scuderia (introdotti nel 1958)
seasons_constr_count = unique(inner_join(results,races,by='raceId') %>% select(year,constructorId)) %>% count(constructorId) %>% rename(held_seasons=n) %>% arrange(desc(held_seasons))
# Riunisco i risultati in un unico df
constr_stats = full_join(constr,seasons_constr_count,by="constructorId")
constr_stats = full_join(constr_stats,seasons_constr_winners,by="constructorId")
# NA = 0
constr_stats[is.na(constr_stats)] <- 0
constr_stats
# --- Calcolo win ratio (seasons)
constr_stats = constr_stats %>% mutate(seasons_win_ratio=championships/held_seasons)
# Pulizia env
rm(constr,seasons_constr_winners,seasons_constr_count)
```
```{r,include = !knit_report, warning=FALSE,message=FALSE,fig.width=10, fig.height=6}
# Plotting results
constr_stats = constr_stats %>% arrange(desc(held_seasons))
constr_stats
ggplot(data=constr_stats[1:30,],aes(x= reorder(name,-held_seasons),y=held_seasons)) +
geom_bar(aes(fill=held_seasons),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 scuderie per numero di campionati svolti")
data = constr_stats %>% arrange(desc(championships)) %>% filter(championships>0)
data
ggplot(data=data,aes(x= reorder(name,-championships),y=championships)) +
geom_bar(aes(fill=championships),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Classifica scuderie per campionati vinti")
constr_stats = constr_stats %>% arrange(desc(seasons_win_ratio))
constr_stats
ggplot(data=data,aes(x= reorder(name,-seasons_win_ratio),y=seasons_win_ratio)) +
geom_bar(aes(fill=seasons_win_ratio),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Rapporto campionati vinti/campionati svolti")
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# Test circular barplot con statistiche scuderie (costruttori)
constr_stats = constr_stats %>% arrange(desc(held_seasons))
constr_stats
ggplot(data=constr_stats[1:30,],aes(x= reorder(name,-held_seasons),y=held_seasons)) +
geom_bar(aes(fill=held_seasons),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1)) +
labs(x="",y="",title="Top 30 scuderie per numero di campionati svolti") +
coord_polar() + ylim(-50,75)
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# --- Calcolo numero pole position piloti e relativo rateo + plot delle due statistiche
# NB: un pilota (ha avuto la pole position) parte primo se al giro 1 parte in pos 1
pole <- results %>% filter(grid==1) %>% select(driverId,grid) %>% count(driverId) %>% rename(pole_positions=n)
# Aggiungo le statistiche relative alle pole_position ai piloti
driver_stats = full_join(driver_stats,pole)
driver_stats[is.na(driver_stats)] <- 0
# pole ratio, calcolo
driver_stats = driver_stats %>% mutate(pole_ratio=pole_positions/races)
# similmente calcolo partenze dalla seconda posizione per ciascun pilota
pole2 <- results %>% filter(grid==2) %>% select(driverId,grid) %>% count(driverId) %>% rename("second_in_grid"=n)
driver_stats = full_join(driver_stats,pole2)
driver_stats[is.na(driver_stats)] <- 0
# data cleaning
rm(pole,pole2)
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# pole number plot
driver_stats = driver_stats %>% arrange(desc(pole_positions))
driver_stats
ggplot(data=driver_stats[1:30,],aes(x= reorder(full_name,-pole_positions),y=pole_positions)) +
geom_bar(aes(fill=pole_positions),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per n° di pole positions")
# 2nd in grid number plot
driver_stats = driver_stats %>% arrange(desc(second_in_grid))
driver_stats
ggplot(data=driver_stats[1:30,],aes(x= reorder(full_name,-second_in_grid),y=second_in_grid)) +
geom_bar(aes(fill=second_in_grid),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per n° di partenze in 2° posizione")
# pole ratio plot
data = driver_stats %>% arrange(desc(pole_ratio)) %>% filter(races>25)
data
ggplot(data=data[1:30,],aes(x= reorder(full_name,-pole_ratio),y=pole_ratio)) +
geom_bar(aes(fill=pole_ratio),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5), plot.margin = margin(0.2, 0.5, 0, 0.6, "cm")) +
labs(x="",y="",title="Top 30 piloti per n° di pole positions (con almeno 25 gare)")
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# Test spidercharts con le stats dei driver
max_races = max(driver_stats$races)
max_wins = max(driver_stats$wins)
max_podiums = max(driver_stats$podiums)
max_tot_points_new = max(driver_stats$tot_points_new)
max_poles= max(driver_stats$pole_positions)
max_podium_rate= max(driver_stats$podium_rate)
max_win_rate= max(driver_stats$win_rate)
max_points_ratio= max(driver_stats$points_ratio)
max_pole_ratio= max(driver_stats$pole_ratio)
maxs = list(max_races,max_wins,max_podiums,max_tot_points_new,max_poles,1,1,1,1)
mins = list(0,0,0,0,0,0,0,0,0)
n_drivers = 6 # numero di piloti per cui plottare i grafici
driver_stats = driver_stats %>% arrange(desc(tot_points_new)) # ordino in base al criterio che considero migliore (per il ranking 1°,2°,3° etc)
```
```{r,include = !knit_report, warning=FALSE,echo=!knit_report,message=FALSE,fig.width=10, fig.height=6}
layout.matrix = matrix(c(1,4,2,5,3,6), nrow=2,ncol=3)
layout(mat = layout.matrix, heights = c(1,1), widths = c(2,2,2))
c <- par(mar=c(1,0,2,0),oma=c(0,0,0,0))
for(i in 1:n_drivers){
test_data = driver_stats[i,] %>% select(races,wins,podiums,tot_points_new,pole_positions,podium_rate,win_rate,points_ratio,pole_ratio)
name = driver_stats$full_name[i]
data = rbind(maxs,mins,test_data)
colnames(data) <- c("Gare disputate","Vittorie","Podi \n","Totale punti","Totale pole","Rateo podi","Rateo vittorie","Rateo punti \n","Rateo pole")
radarchart(data, axistype=0,
#polygon
pcol=rgb(0/255,0/255,0/255,0.9) , pfcol=rgb(255/255,0/255,0/255,0.4) , plwd=3,
#grid
cglcol="grey", cglty=3, axislabcol="grey", cglwd=1,
#labels
vlcex=0.9,
title=paste(name," (",toString(i),"°)",sep="")
)
}
#reset dei par
par(c)
```
```{r,include = !knit_report, warning=FALSE,message=FALSE,echo=!knit_report,fig.width=10, fig.height=6}
# Plot top nazionalità per num piloti
country_stats = country_stats %>% arrange(desc(drivers))
ggplot(data=country_stats[1:20,],aes(x= reorder(country,-drivers),y=drivers)) +
geom_bar(aes(fill=drivers),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5)) +
labs(x="",y="",title="Top 20 nazioni per n° di piloti")
# Plot top nazionalità per gare disputate
country_stats = country_stats %>% arrange(desc(held_races))
ggplot(data=country_stats[1:20,],aes(x= reorder(country,-held_races),y=held_races)) +
geom_bar(aes(fill=held_races),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5)) +
labs(x="",y="",title="Top 20 nazioni per n° di gare disputate")
# Plot top nazionalità per num circuiti
country_stats = country_stats %>% arrange(desc(circuits_used))
ggplot(data=country_stats[1:20,],aes(x= reorder(country,-circuits_used),y=circuits_used)) +
geom_bar(aes(fill=circuits_used),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5)) +
labs(x="",y="",title="Top 20 nazioni per n° circuiti utilizzati")
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# --- Aggiorno le statistiche sulle nazioni
# gare svolte da piloti, per nazionalità
held_races_per_country = driver_stats %>% select(country,races)
held_races_per_country = aggregate(held_races_per_country$races,by=list(country=held_races_per_country$country),FUN=sum) %>% rename(held_races_by_drivers=x)
# campionati svolti dai costruttori, per nazionalità
held_seasons_per_constr = aggregate(constr_stats$held_seasons,by=list(country=constr_stats$country),FUN=sum) %>% rename(held_seasons_by_constr=x)
# merge dei dati
country_stats = full_join(country_stats,held_races_per_country,by="country")
country_stats = full_join(country_stats,held_seasons_per_constr,by="country")
country_stats[is.na(country_stats)] <- 0 # NA to zeros
rm(constr_per_country,held_races_per_country,held_seasons_per_constr) # data clean
country_stats
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# Calcolo di un punteggio per le nazioni ("nazione più significativa/presente nella F1 negli ultimi 70 anni")
# Pesi delle varie statistiche (la somma deve dare 1.0)
weight_drivers = 0;
weight_constr = 0;
weight_held_races = 0.4;
weight_circuits_used = 0.0;
weight_held_races_by_drivers = 0.3;
weight_held_seasons_by_constr = 0.3;
tot_drivers = sum(country_stats$drivers,na.rm = TRUE)
tot_constr = sum(country_stats$constructors,na.rm = TRUE)
tot_held_races = sum(country_stats$held_races,na.rm = TRUE)
tot_circuits_used = sum(country_stats$circuits_used,na.rm = TRUE)
tot_held_races_by_drivers = sum(country_stats$held_races_by_drivers,na.rm = TRUE)
tot_held_seasons_by_constr = sum(country_stats$held_seasons_by_constr,na.rm = TRUE)
score_multiplier=1000
country_stats <- country_stats %>% mutate(score= score_multiplier*(
(drivers/tot_drivers)*weight_drivers +
(constructors/tot_constr)*weight_constr +
(held_races/tot_held_races)*weight_held_races +
(circuits_used/tot_circuits_used)*weight_circuits_used +
(held_races_by_drivers/tot_held_races_by_drivers)*weight_held_races_by_drivers +
(held_seasons_by_constr/tot_held_seasons_by_constr)*weight_held_seasons_by_constr))
1 == weight_drivers + weight_constr +weight_held_races +weight_circuits_used +weight_held_races_by_drivers +weight_held_seasons_by_constr
sum(country_stats$score,na.rm=TRUE) == score_multiplier
```
```{r,include = !knit_report, warning=FALSE,message=FALSE,echo=!knit_report,fig.width=10, fig.height=6}
# Plot top 25 nazioni per score
country_stats = country_stats %>% arrange(desc(score))
country_stats
ggplot(data=country_stats[0:25,],aes(x= reorder(country,-score),y=score)) +
geom_bar(aes(fill=score),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=52.5, vjust=1, hjust=1,size=10.5)) +
labs(x="",y="",title="Top 25 nazioni più presenti nella storia della F1")
ggplot(data=country_stats[0:25,],aes(x= reorder(country,score),y=score)) +
geom_bar(aes(fill=score),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1),axis.text.y = element_text(size=10.5)) +
labs(x="",y="",title="test horizontal barplot") +
coord_flip()
```
```{r,include = !knit_report, warning=FALSE,message=FALSE}
# Test spidercharts con le stats delle nazioni
max_drivers = max(country_stats$drivers)
max_constr = max(country_stats$constructors)
max_held_races = max(country_stats$held_races)
max_circuits_used = max(country_stats$circuits_used)
max_held_races_by_drivers = max(country_stats$held_races_by_drivers)
max_held_seasons_by_constr = max(country_stats$held_seasons_by_constr)
maxs = list(max_drivers,max_constr,max_held_races,max_circuits_used,max_held_races_by_drivers,max_held_seasons_by_constr)
mins = list(0,0,0,0,0,0)
n_country = 6 # numero di nazioni per cui plottare i grafici
country_stats = country_stats %>% arrange(desc(score)) # ordino in base al criterio che considero migliore (per il ranking 1°,2°,3° etc)
```
```{r,include = !knit_report, warning=FALSE,echo=!knit_report,message=FALSE,fig.width=10, fig.height=6}
layout.matrix = matrix(c(1,4,2,5,3,6), nrow=2,ncol=3)
layout(mat = layout.matrix, heights = c(1,1), widths = c(2,2,2))
c <- par(mar=c(1,0,2,0),oma=c(0,0,0,0))
for(i in 1:n_country){
test_data = country_stats[i,] %>% select(drivers,constructors,held_races,circuits_used,held_races_by_drivers,held_seasons_by_constr)
name = country_stats$country[i]
data = rbind(maxs,mins,test_data)
colnames(data) <- c("Piloti","Costruttori \n","\n\n Gare disputate \n su circuiti","Circuiti","\n\n Gare disputate \n da piloti","Stagioni svolte \n da costruttori \n\n")
radarchart(data, axistype=0,
#polygon
pcol=rgb(0/255,0/255,0/255,0.9) , pfcol=rgb(0/255,200/255,0/255,0.4) , plwd=3,
#grid
cglcol="grey", cglty=3, axislabcol="grey", cglwd=1,
#labels
vlcex=0.9,
title=paste(name," (",toString(i),"°)",sep="")
)
}
# reset dei par e pulizia ambiente
par(c)
rm(c,layout.matrix,maxs,mins,test_data)
```
```{r,include = !knit_report, warning=FALSE,echo=!knit_report,message=FALSE,fig.width=10, fig.height=6}
# World cloud test con nazioni
wordcloud(country_stats$country,country_stats$score,scale=c(10,0.5),max.words = length(country_stats$country),random.order=FALSE,rot.per=0.25,ordered.colors = FALSE,random.color = TRUE)
```
```{r,include = !knit_report, warning=FALSE,echo=!knit_report,message=FALSE,fig.width=10, fig.height=5}
library("rnaturalearth")
library("rnaturalearthdata")
world <- ne_countries(scale = "medium", returnclass = "sf")
class(world)
# correggo UK e USA in united kingdom e united states
country_stats_copy = country_stats %>% mutate(country=replace(country,country=="UK","United Kingdom")) %>% mutate(country=replace(country,country=="USA","United States of America"))
world = world %>% rename(country=sovereignt)
world = full_join(world,country_stats_copy, by="country")
ggplot(data = world) +
geom_sf(aes(fill = score)) +
scale_color_gradient2() +
coord_sf(xlim = c(-170, 170), ylim = c(-60, 80), expand = FALSE)
ggplot(data = world) +
geom_sf(aes(fill = score)) +
scale_color_gradient2() +
coord_sf(xlim = c(-20, 50), ylim = c(30, 70), expand = FALSE)
# rimozione dei dati
rm(world)
```
```{r,include = !knit_report, warning=FALSE,echo=!knit_report,message=FALSE,fig.width=10, fig.height=5}
# Calcolo KM totali percorsi in F1
# Mi baso sul n° di km da percorrere per ciascun gran premio (in base ai vari regolamenti dei vari anni) (dati trovati su internet)
# il n° è una stima dei km percorsi da ogni pilota (che ha completato la gara) in ogni grand prix di quella stagione, qualora il regolamento prevedesse dei range (vedi 300-500) si è usato il valore medio (400)
grand_prix_km_v1 <- as_tibble(read_csv("data\\grand_prix_km.txt"))
grand_prix_km_v1$year = as.numeric(grand_prix_km_v1$year)
races_km = races %>% select(year,raceId)
races_km = inner_join(races_km,grand_prix_km_v1,by="year")
# Ora per ciascuna gara conto il numero di piloti che l'hanno percorsa
counts = results %>% select(driverId,raceId) %>% count(raceId) %>% rename(partecipants=n)
# Calcolo per ciascuna gara il n° totale di km percorsi dai partecipanti (rimuovendo il 5% di scarto: ritiri, incidenti, etc.)
races_km = inner_join(races_km,counts,by="raceId") %>% mutate(tot_km=partecipants*estimated_tot_km_per_driver*0.9)
tot_f1_km_percorsi = sum(races_km$tot_km)
circonferenza_terra=40075
circonferenza_sole=4379000
distanza_marte_terra=187640000
terre_percorse=tot_f1_km_percorsi/circonferenza_terra
soli_percorsi=tot_f1_km_percorsi/circonferenza_sole
distanze_m_t=tot_f1_km_percorsi/distanza_marte_terra
tot_f1_km_percorsi
terre_percorse
soli_percorsi
distanze_m_t
rm(grand_prix_km_v1,counts,races_km)
```
```{r,include = !knit_report, warning=FALSE,echo=!knit_report,message=FALSE,fig.width=10, fig.height=5}
# Calcolo stagioni vinte da piloti e da costruttori
# ottengo l'anno delle gare
years = races %>% select(year,raceId)
# piloti
seasons_win = inner_join(driverStand,years,by="raceId") %>% select(driverId,year,position,raceId) %>% filter(position==1) %>% group_by(year) %>% top_n(1,raceId) %>% select(driverId,year)
seasons_win_count = seasons_win %>% group_by(driverId) %>% count(driverId)
driver_stats = full_join(driver_stats,seasons_win_count,by="driverId") %>% rename(championship=n)
driver_stats[is.na(driver_stats)] <- 0 # NA to zeros
rm(seasons_win,seasons_win_count)
```
```{r,include = !knit_report, warning=FALSE,message=FALSE,echo=!knit_report,fig.width=10, fig.height=6}
# Plot top piloti per n° di campionati vinti
data = driver_stats %>% arrange(desc(championship)) %>% filter(championship>0)
data
ggplot(data=data,aes(x= reorder(full_name,championship),y=championship)) +
geom_bar(aes(fill=championship),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1),axis.text.y = element_text(size=10.5)) +
labs(x="",y="",title="Classifica piloti per n° di titoli mondiali") +
coord_flip()
# Plot top costruttori per n° di campionati vinti
data = constr_stats %>% arrange(desc(championships)) %>% filter(championships>0)
data
ggplot(data=data,aes(x= reorder(name,championships),y=championships)) +
geom_bar(aes(fill=championships),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1),axis.text.y = element_text(size=10.5)) +
labs(x="",y="",title="Classifica costruttori per n° di titoli mondiali") +
coord_flip()
```
```{r,include = !knit_report, warning=FALSE,echo=!knit_report,message=FALSE,fig.width=10, fig.height=5}
# car reliabilty factor (n° of retire, frequency etc.)
retire_results = results %>% filter(is.na(position)) %>% select(raceId,driverId)
results_per_year = full_join(results,races_year,by="raceId") %>% count(year) %>% rename(tot_results=n)
retire_results_per_year = full_join(retire_results,races_year,by="raceId") %>% count(year) %>% rename(tot_retire=n)
retire_per_year_stats= full_join(results_per_year,retire_results_per_year,by="year") %>% mutate(retire_ratio=tot_retire/tot_results)
# driver retirements
retire_per_driver = retire_results %>% count(driverId )%>% rename(tot_retire=n)
driver_stats = full_join(driver_stats,retire_per_driver,by="driverId")
driver_stats$tot_retire[is.na(driver_stats$tot_retire)] <- 0 # sostituisco NA con 0
driver_stats = driver_stats %>% mutate(completed_races=races-tot_retire)
driver_stats = driver_stats %>% mutate(car_reliability_bonus_percentage = tot_retire/races)
```
```{r,include = !knit_report, warning=FALSE,message=FALSE, echo=!knit_report,fig.width=10, fig.height=6}
ggplot(data=retire_per_year_stats[],aes(x= reorder(year,year),y=1-retire_ratio)) +
geom_bar(position = "dodge", aes(fill=retire_ratio),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=90, vjust=1, hjust=1,size=9)) +
labs(x="",y="",title="% di completamento gara")
```
```{r,include = !knit_report, warning=FALSE,echo=!knit_report,message=FALSE,fig.width=10, fig.height=5}
# driver performance score
wins_w = 0.25
podiums_w = 0.25
points_w = 0.25
pole_w = 0.25
driver_stats = driver_stats %>% mutate(performance=(wins_w*win_rate+
podiums_w*podium_rate+
points_w*points_ratio+
pole_w*pole_ratio))
# with car reliability bonus
driver_stats = driver_stats %>% mutate(performance_with_bonus = performance*(1+car_reliability_bonus_percentage))
# Plot top piloti per performance
data = driver_stats %>% arrange(desc(performance)) %>% filter(races>20)
data
ggplot(data=data[0:20,],aes(x= reorder(full_name,performance),y=performance)) +
geom_bar(aes(fill=performance),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1),axis.text.y = element_text(size=10.5)) +
labs(x="",y="",title="performance") +
coord_flip()
# Plot top piloti per performance con BONUS
data = driver_stats %>% arrange(desc(performance_with_bonus)) %>% filter(races>20)
data
ggplot(data=data[0:20,],aes(x= reorder(full_name,performance_with_bonus),y=performance_with_bonus)) +
geom_bar(aes(fill=performance_with_bonus),stat = 'identity', show.legend = FALSE) +
theme(axis.text.x = element_text(angle=0, vjust=1, hjust=1),axis.text.y = element_text(size=10.5)) +
labs(x="",y="",title="performance_with_bonus") +
coord_flip()
```
```{r,include=FALSE}
# ---> This part includes the prose for advanced data science exam (Università degli Studi di Udine) <---
# NB: A large amount of data computed previously to this chunk is used in the report, so in order
# to execute following chunks, all the previous ones must be executed (sequentially)
##########################################################################################################
########################################## REPORT/RELAZIONE ##############################################
##########################################################################################################
```
# Introduction
In this presentation we will analize data about [Formula One](https://www.formula1.com/) history. We will explore data relating drivers, constructors, countries, circuits, race results, etc. We will try to find out which drivers and constructors have dominated this motorsport during these 71 years. In order to evaluate their performance, the presentation proposes a ranking system, based on multiple metrics. The presentation also include some statistics and informations about F1 itself. All the used data are available [here](https://www.kaggle.com/rohanrao/formula-1-world-championship-1950-2020) and consist of 13 csv files. Sourcecode is available [here](https://github.com/Mansitos/Progetto-DataScience-UNI-F1-Data-Analysis-R-Language).
## What is Formula One
Formula One (abbreviated to *F1*) is the highest class of international auto racing for single-seater racing car approved by the [Fédération Internationale de l’Automobile (FIA)](https://www.fia.com/). A Formula One season consists of a series of races, known as *Grands Prix*, which take place worldwide on purpose-built circuits and on closed public roads.
The results of each race are evaluated using a *points system* to determine two annual World Championships: one for drivers and one for constructors.
Formula One cars are considered the fastest regulated road-course racing cars in the world, characterized by very high cornering speeds, achieved through the generation of large amounts of aerodynamic downforce. Most modern F1 cars can achieve peaks of 6.5 lateral g while cornering and approximately top speeds of 360km/h. Traction control and other driving aids have been banned since 2008.
*You can learn more on how F1 works by watching this video: [F1 Drivers Explain F1](https://www.youtube.com/watch?v=twAlqtvVMdc).*
## Some history about F1
Formula One has its roots in the *European Grand Prix championships* of the 1920s and 1930s. The foundation of F1 began in 1946 with the FIA’s standardisations of rules, which was followed by the first *World Championships of Drivers* in 1950.
The history of Formula One is usually divided into eras, but this division is tipycally based on subjective criterias and there is no real official division. We will dive deep into the history of Formula One by introducing some of the most iconic drivers. We will do so by querying the data and obtaining important results related to their careers, such as number of wins, number of podiums etc.
By analyzing the data we can also obtain some numbers characterizing the history of F1:
```{r, out.width = "1000px",echo = !knit_report,fig.align='center'}
knitr::include_graphics("imgs\\stats_img.jpg")
```
*You can learn more on F1 history by watching those two videos: [The History Of Formula 1 | Race 1000](https://www.youtube.com/watch?v=bbPOCRmpAzY) and [History Of F1's Innovation | F1 70th Anniversary](https://www.youtube.com/watch?v=j9LNd-c6OVM&t=0s).*
### First F1 winner
Giuseppe Farina (also known as Giuseppe Antonio “Nino” Farina) won the first F1 race and the first F1 Drivers Championship. Farina drove for Ferrari, Alfa Romeo and Lancia. Let’s see what the data can tell us about this iconic driver:
```{r, out.width = "1440px",echo = !knit_report,fig.align='center'}
knitr::include_graphics("imgs\\drivers_labels\\NinoFarina.jpg")
```
*“Because of the crazy way Farina drove only the Holy Virgin was capable of keeping him on the track.” - Juan Manuel Fangio*
### Domination of the 50s by Fangio & Ascari
After the win of the first championship by Farina, the 1950s were dominated by two iconic drivers, Juan Manuel Fangio and Alberto Ascari:
```{r, out.width = "1440px",echo = !knit_report,fig.align='center'}
knitr::include_graphics("imgs\\drivers_labels\\Fangio.jpg")
```
```{r, out.width = "1440px",echo = !knit_report,fig.align='center'}
knitr::include_graphics("imgs\\drivers_labels\\Ascari.jpg")
```
Fangio’s records remained unbeaten for over 30 years. 50s are remembered as the deadliest decade in F1, with 15 casualties. Helmets became mandatory 2 years after the birth of F1, in 1952. Seatbelts became mandatory in 1972.
### The driver of the 1960s
The first driver to beat Fangio’s wins record was Jim Clark, a driver who dominated the circuits in the 1960s.
```{r, out.width = "1440px",echo = !knit_report,fig.align='center'}
knitr::include_graphics("imgs\\drivers_labels\\JimClark.jpg")
```
*“Jim Clark was everything I aspired to be, as a racing driver and as a man” - Sir Jackie Stewart*
### The Flying Scot
Clark’s 25 wins record was broken by his greatest admirer: Sir John Jackie Stewart.
```{r, out.width = "1440px",echo = !knit_report,fig.align='center'}
knitr::include_graphics("imgs\\drivers_labels\\Stewart.jpg")
```
### The battle between teammates
It took 14 years to break Stewart’s winning record; a new era began, and the progenitor was Alain Prost.
```{r, out.width = "1440px",echo = !knit_report,fig.align='center'}
knitr::include_graphics("imgs\\drivers_labels\\Prost.jpg")
```
A famous period of Formula One is the one called the Prost-Senna rivalry. The rivalry between the two pilots was at its most intense during the period in which they were teammates at McLaren (1988-1989).
```{r, out.width = "1440px",echo = !knit_report,fig.align='center'}
knitr::include_graphics("imgs\\drivers_labels\\Senna.jpg")
```
Ayrton Seanna is probably the most loved driver in F1 history. Senna has distinguished himself throughout his career and was regarded as a prodigy driver fated to break any record. Unfortunately Senna died aged 34 after a crash during the San Marino Grand Prix on 1 May 1994, the day after the death of another driver: Roland Ratzenberger.
*“Racing, competing, it’s in my blood. It’s part of me, it’s part of my life; I have been doing it all my life and it stands out above everything else.” - Ayrton Senna*
### The Ferrari Era
One of the most iconic driver, if not the most iconic one, is Michael Schumacher, which during his career broke most of the records of this motorsport, settings a new standard which, at the time, was beyond imagination.
```{r, out.width = "1440px",echo = !knit_report,fig.align='center'}
knitr::include_graphics("imgs\\drivers_labels\\Shumi.jpg")
```
*“I always thought records were there to be broken. - Michael Schumacher”*
### The Hybrid Era: the birth of a new legend
With the advent of the hybrid era (2014-present), Formula One was dominated by Mercedes and by his top driver Lewis Hamilton.
```{r, out.width = "1440px",echo = !knit_report,fig.align='center'}
knitr::include_graphics("imgs\\drivers_labels\\Lewis.jpg")