Faster identification of faster Formula 1 drivers via time-rank duality
Companion R code and data
Introduction
This page contains the R code and data to reproduce the statistical analysis in the paper [1] named Faster identification of faster Formula 1 drivers via time-rank duality by John Fry, Tom Brighton and Silvio Fanzon.
The code should be simple to understand and comments are provided throughout. For a deeper understanding of the ranking model proposed, and the underlying statistical analysis, please refer to the paper [1].
You are free to use and modify the code in accordance with the license CC BY-NC 4.0. We kindly ask our work is credited by citing the paper [1]. You can download the BibTeX citation here.
The data
Data used for the statistical analysis in the paper [1] can be downloaded here. The latter contains placements of 20 drivers for the 22 races in the 2022 F1 Season plus 3 sprint races (Source).
The R code
The annotated R code given below reproduces the statistical analysis in the paper [1]. The code is mix of R scripts and interactive R console work.
The code runs in R version 4.3.3 and above with no additional packages.
Calibration with bookmakers’ odds
The first R function listed below is used to minimise the residual sum of squares between the implied probabilities obtained from bookmakers odds and the win probabilities written as a function of the lambda values. The input is a parameter of lambda values. The dimension of the input vector is the number of unique bookmakers odds. This is an important constraint that needs to be obeyed. Imposing this constraint also improves the speed and smoothness of the computation. The function is then run in conjunction with the optim
command in R to perform the minimisation. Please see below.
#input is the vector of win probabilities
#output is the estimated lambda values
<- function(x){
lambdaest4
<- c(0.031655049, 0.031655049, 0.673389233, 0.063310099,
input 0.031655049, 0.028380389, 0.063310099, 0.048413605,
0.001642777, 0.001642777, 0.01016088, 0.001642777,
0.001642777, 0.001642777, 0.001642777, 0.001642777,
0.001642777, 0.001642777, 0.001642777, 0.001642777)
#given 7 input values
<-sort(input)
target<-rep(c(x[1], x[2], x[3], x[4], x[5], x[6], x[7]),
lambdarle(target)$lengths)
<- lambda / sum(lambda)
pred <- sum( (target - pred )^2 )
distance
return(distance)
}
To optimize lambdaest4
we run optim
on a set of randomly generated values. After careful randomized restarts, a local minimizer is found to be
x1
[1] 0.0004205564 0.0026012171 0.0072654675 0.0081037902 0.0123940343
[6] 0.0162075831 0.1723897481
As proof of concept that x1
is a local minimizer we run optim
starting at x1
optim(x1, lambdaest4, control=list(maxit=10000))$par
[1] 0.0004205564 0.0026012171 0.0072654675 0.0081037902 0.0123940343
[6] 0.0162075831 0.1723897481
Regression estimation
The regression analysis in the paper proceeds via stepwise regression. Useful background can be found in Fry and Burke [2]. However, in sharp contrast to the standard regression examples in Fry and Burke [2], a constraint is made so that all considered models have to include the driverorder2
dummy variable distinguishing between teams’ first and second drivers.
The following R code reads in the data on drivers placements found here. Then it assigns variables and then runs a set of stepwise, forwards and backwards regressions.
<- read.table("F:f1seconddata.txt")
f1seconddata <- f1seconddata[ , -1]
position <- c(position[,1], position[,2], position[,3],
positionlabel 4], position[,5], position[,6],
position[,7], position[,8], position[,9],
position[,10], position[,11], position[,12],
position[,13], position[,14], position[,15],
position[,16], position[,17], position[,18],
position[,19], position[,20], position[,21],
position[,22], position[,23], position[,24],
position[,25])
position[,
#Parameterise in terms of first driver, second driver
<- rep(c(1, 2), 10)
driverorder < - rep(driverorder, 25)
driverorder
#Re-coded the driver dummy variable to lie between 0 and 1
<- driverorder - 1
driverorder2 <- rep(c("Mercedes", "RedBull", "Ferrari", "Mclaren",
constructors "Alpine", "AstonMartin", "Haas", "AlfaTauri",
"AlfaRomeo", "Williams"),
c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2))
<- rep(constructors, 25)
constructors
<- 1 * (constructors == "Mercedes")
mercedesdummy <-1 * (constructors == "RedBull")
redbulldummy <- 1 * (constructors == "Ferrari")
ferraridummy <- 1 * (constructors == "Mclaren")
mclarendummy <- 1 * (constructors == "Alpine")
alpinedummy <- 1 * (constructors == "AstonMartin")
astonmartindummy <- 1 * (constructors == "Haas")
haasdummy <- 1 * (constructors == "AlfaTauri")
alfatauridummy <- 1 * (constructors == "AlfaRomeo")
alfaromeodummy
<- lm(formula = positionlabel ~ driverorder2 + mercedesdummy
full2.lm + redbulldummy + ferraridummy + mclarendummy
+ alpinedummy + astonmartindummy + haasdummy
+ alfatauridummy + alfaromeodummy)
<- lm(positionlabel ~ driverorder2)
b.lm
#Stepwise regression
step(b.lm,
scope = list(
lower = formula(b.lm),
upper = formula(full2.lm)),
direction = "both")
<- lm(formula = positionlabel ~ driverorder2 + redbulldummy
stepwise.lm + mercedesdummy + ferraridummy + mclarendummy
+ alpinedummy + astonmartindummy)
#Forward selection
step(b.lm,
scope = list(
lower = formula(b.lm),
upper = formula(full2.lm)),
direction = "forward")
<- lm(formula = positionlabel ~ driverorder2
stepforward.lm + redbulldummy + mercedesdummy + ferraridummy
+ mclarendummy + alpinedummy + astonmartindummy)
#Backard selection
step(full2.lm,
scope = list(
lower = formula(b.lm),
upper = formula(full2.lm)),
direction = "backward")
<- lm(formula = positionlabel ~ driverorder2
stepback.lm + mercedesdummy + redbulldummy + ferraridummy
+ mclarendummy + alpinedummy + astonmartindummy
+ haasdummy + alfatauridummy + alfaromeodummy)
At this juncture it becomes clear that forwards and stepwise regression choose the same model. Backwards regression leads to a model with additional variables in it. The following R code suggests that the larger model does not lead to a significant improvement over the smaller model chosen by stepwise regression.
anova(stepwise.lm, stepback.lm, test = "F")
Analysis of Variance Table
Model 1: positionlabel ~ driverorder2 + redbulldummy + mercedesdummy +
ferraridummy + mclarendummy + alpinedummy + astonmartindummy
Model 2: positionlabel ~ driverorder2 + mercedesdummy + redbulldummy +
ferraridummy + mclarendummy + alpinedummy + astonmartindummy +
haasdummy + alfatauridummy + alfaromeodummy
Res.Df RSS Df Sum of Sq F Pr(>F)
1 492 10117.4
2 489 9978.1 3 139.3 2.2756 0.07903 .
The following R code now presents the regression results presented in Table 3 of the paper.
summary(stepwise.lm)
Call:
lm(formula = positionlabel ~ driverorder2 + redbulldummy + mercedesdummy +
ferraridummy + mclarendummy + alpinedummy + astonmartindummy)
Residuals:
Min 1Q Median 3Q Max
-9.058 -3.192 -1.058 2.517 15.592
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 13.8420 0.3794 36.484 < 2e-16 ***
driverorder2 0.2160 0.4056 0.533 0.5946
redbulldummy -9.6500 0.7170 -13.459 < 2e-16 ***
mercedesdummy -8.2700 0.7170 -11.534 < 2e-16 ***
ferraridummy -7.6900 0.7170 -10.725 < 2e-16 ***
mclarendummy -3.5500 0.7170 -4.951 1.02e-06 ***
alpinedummy -3.5500 0.7170 -4.951 1.02e-06 ***
astonmartindummy -1.7900 0.7170 -2.496 0.0129 *
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 4.535 on 492 degrees of freedom
Multiple R-squared: 0.3914, Adjusted R-squared: 0.3828
F-statistic: 45.21 on 7 and 492 DF, p-value: < 2.2e-16
License & Attribution
This work is licensed under Creative Commons Attribution-NonCommercial 4.0 International License
This license enables reusers to distribute, remix, adapt, and build upon the material in any medium or format for noncommercial purposes only, and only so long as attribution is given to the creator. We kindly ask our work is credited by citing the paper [1] as shown below
Fry, John and Brighton, Tom and Fanzon, Silvio. Faster identification of faster Formula 1 drivers via time-rank duality, Economics Letters, 237:111671, 2024
https://doi.org/10.1016/j.econlet.2024.111671
BibTex citation: Download here or copy from box below
@article{2024-Fry-Bri-Fan,
author = {Fry, John and Brighton, Tom and Fanzon, Silvio},
title = {Faster identification of faster Formula 1 drivers via
time-rank duality},journal = {Economics Letters},
volume = {237},
pages = {111671},
year = {2024},
doi = {10.1016/j.econlet.2024.111671}
}