Building Optimal Daily Fantasy Lineups in R

‘Are you ready for some footbaaaalll?!’ It’s that time of the year, the NFL is back! Like many others, part of my enjoyment of football season is through fantasy football, specifically daily fantasy (like DraftKings or Fanduel is you’re unfamiliar). Obviously, I want to apply my analytic skillset to give me an edge, and there’s lot of great packages in R that make that easier to do.

In this post, I’ll be using the great ffanalytics package to scrape projections from multiple sites, and simulate multiple optimal lineups to get an idea of which players to target in my lineups.

The first step, as always, is getting the right packages loaded

library(data.table)
library(dtplyr)
library(tidyverse)
library(ffanalytics)
library(lpSolve)
library(rPref)
library(kableExtra)

## Specify Week
week <- 3

Scrape Projections

Next, we select the sources we want to pull from. (I’d normally include ESPN in here, but the results have been buggy the last few times I’ve pulled).

sources <- c('CBS', 'Yahoo', 'FantasySharks', 'NumberFire', 'FantasyPros', 'FantasyData', 'FleaFlicker')

scrape <- scrape_data(src = sources,
                      pos=c('QB', 'RB', 'WR', 'TE', 'DST'),
                      season = 2019, 
                      week=week)

A nice feature of ffanalytics is that you can provide custom scoring schemas. Here, I set up the DraftKings scoring schema and apply it to the scraped data, so all projections will be in that format.

scoring <- list(
  pass = list(
    pass_att = 0, pass_comp = 0, pass_inc = 0, pass_yds = 0.04, pass_tds = 4,
    pass_int = -1, pass_40_yds = 0,  pass_300_yds = 3, pass_350_yds = 0,
    pass_400_yds = 0
  ),
  rush = list(
    all_pos = TRUE,
    rush_yds = 0.1,  rush_att = 0, rush_40_yds = 0, rush_tds = 6,
    rush_100_yds = 3, rush_150_yds = 0, rush_200_yds = 0),
  rec = list(
    all_pos = TRUE,
    rec = 1, rec_yds = 0.1, rec_tds = 6, rec_40_yds = 0, rec_100_yds = 3,
    rec_150_yds = 0, rec_200_yds = 0
  ),
  misc = list(
    all_pos = TRUE,
    fumbles_lost = -1, fumbles_total = 0,
    sacks = 0, two_pts = 2
  ),
  ret = list(
    all_pos = TRUE,
    return_tds = 6, return_yds = 0
  ),
  dst = list(
    dst_fum_rec = 2,  dst_int = 2, dst_safety = 2, dst_sacks = 1, dst_td = 6,
    dst_blk = 2, dst_ret_yds = 0, dst_pts_allowed = 0
  ),
  pts_bracket = list(
    list(threshold = 0, points = 10),
    list(threshold = 1, points = 7),
    list(threshold = 7, points = 4),
    list(threshold = 14, points = 1),
    list(threshold = 21, points = 0),
    list(threshold = 28, points = -1),
    list(threshold = 35, points = -4)
  )
)


proj <- projections_table(scrape, scoring_rules = scoring) %>%
  add_player_info()

Now that we have the projections in the proper format, we need to add player salaries. I don’t have a site to programatically pull this from, the way I do it currently is just from exporting from the DraftKings lineup page. There’s some cleaning that has to be done in order to make the names match, (Todd Gurley is an example), and I’m sure there’s a cleaner way to do this, but hey it worked so far.

Another potential issue is sites not updating their injury reports. For example, Trevor Siemian still had a projection for some sites, even though he’s out for the season, so I manually remove him

## Read in DraftKings
injured <- c('Trevor Siemian') ## Remove Injured Players

sal <- read_csv('DKSalaries.csv') %>%
  filter(!Name %in% injured)

sal$Name[sal$Name=='Todd Gurley II'] <- 'Todd Gurley'

Build Optimal Lineups

Next I define a function to merge the salary data with the prediction data, and randomly generate a points estimate for a given player, which I’m able to do because ffanalytics provides standard deviations for their projections, allowing me to pull projections from a probability distribution. I then generate a lineup via lpsolve. Building lineups is a variant of the classic ‘knapsack’ problem in optimization, I only have so much salary, and I need to fit players in my lineup that will get me the most points while meeting certain constraints (1 QB, 2 RB, 3 WR, 1 TE, 1 DST, 1 Flex, all under $50,000). I repeat this process 10,000 times, giving me 10,000 lineups under 10,000 scoring scenarios.

generate_lineup <- function(n){
  
  pred_sal <- proj %>% 
    filter(avg_type=='robust') %>%
    mutate(Name = ifelse(pos=="DST", last_name, paste(first_name, last_name))) %>%
    inner_join(sal, by=c("Name")) %>%
    select(Name, team, position, points, Salary, sd_pts) %>%
    filter(!is.na(points), !is.na(Salary)) %>%
    group_by(Name) %>%
    mutate(sal_max=max(Salary)) %>%
    filter(Salary==sal_max) %>%
    group_by(Name) %>%
    mutate(pts_pred = rnorm(1, points, sd_pts),
           lineup=n) %>%
    select(-sal_max)
  

  obj <- pred_sal$pts_pred
  
  mat <- rbind(t(model.matrix(~ position + 0,pred_sal)), t(model.matrix(~ position + 0,pred_sal)), rep(1, nrow(pred_sal)), pred_sal$Salary)
  
  dir <- c("=","=","<=","<=","<=", "=","=",">=",">=",">=","=","<=")
  
  rhs <- c(1,1,3,2,4,1,1,2,1,3,9,50000)
  
  result <- lp("max", obj, mat, dir, rhs, all.bin = TRUE)   
  
  results <- pred_sal[which(result$solution == 1),]
  
  return(results)
}

sim_lu <- map_df(1:10000, generate_lineup) %>%
  rename(pts_base=points) %>%
  select(lineup, Name, team, position, pts_base, pts_pred, sd_pts, Salary)

Explore Optimal Lineups

Now that we have 10,000 optimal lineups, lets see how those lineups look.

Here’s a sample of three optimal lineups, so you can get a sense of what the data looks like.

sim_lu %>%
  filter(lineup<=3) %>%
  arrange(lineup, position, desc(pts_pred)) %>%
  knitr::kable() %>%
  kable_styling() %>%
  column_spec(1, bold=TRUE) %>%
  collapse_rows(columns = 1, valign = 'top')

lineup	Name	team	position	pts_base	pts_pred	sd_pts	Salary
1	Dolphins	MIA	DST	4.97000	6.598585	0.7086828	2000
	Dak Prescott	DAL	QB	23.73500	24.971105	0.8272908	6500
	Christian McCaffrey	CAR	RB	24.96164	25.569784	1.0081680	8700
	Devonta Freeman	ATL	RB	12.71842	14.707710	1.0526460	4900
	Zach Ertz	PHI	TE	17.38000	18.002119	2.5352460	5700
	Keenan Allen	LAC	WR	20.24500	24.267924	2.6612670	7000
	Sammy Watkins	KCC	WR	17.73100	21.472349	2.6345802	6800
	Emmanuel Sanders	DEN	WR	14.76000	15.308993	1.5537648	4800
	Nelson Agholor	PHI	WR	13.08025	12.421407	0.6108312	3600
2	Cardinals	ARI	DST	7.02700	7.956459	1.1327064	2700
	Jameis Winston	TBB	QB	20.35533	20.534232	0.4922232	5400
	Christian McCaffrey	CAR	RB	24.96164	24.693976	1.0081680	8700
	Ezekiel Elliott	DAL	RB	24.45650	22.832514	2.1742329	8900
	Zach Ertz	PHI	TE	17.38000	16.542479	2.5352460	5700
	O.J. Howard	TBB	TE	8.45050	11.634965	1.4974260	3800
	DeSean Jackson	PHI	WR	11.60000	22.402629	8.5990800	5800
	Larry Fitzgerald	ARI	WR	15.60500	15.718476	0.8006040	5100
	Nelson Agholor	PHI	WR	13.08025	12.519298	0.6108312	3600
3	Steelers	PIT	DST	6.83000	7.748545	0.8747340	2300
	Jameis Winston	TBB	QB	20.35533	20.635016	0.4922232	5400
	Saquon Barkley	NYG	RB	22.66727	25.769062	2.5975152	9100
	Christian McCaffrey	CAR	RB	24.96164	24.227040	1.0081680	8700
	C.J. Uzomah	CIN	TE	4.88000	10.489492	2.6242020	2700
	Julio Jones	ATL	WR	19.08000	19.990107	1.9251561	7300
	T.Y. Hilton	IND	WR	16.04000	19.965852	1.8021003	6400
	Larry Fitzgerald	ARI	WR	15.60500	15.695509	0.8006040	5100
	Darius Slayton	NYG	WR	4.81500	10.644966	5.4633810	3000

Now that we have an idea of what the data looks let’s dig into what these optimized lineups are showing.

First, a look at which players are represented in optimal lineups. Here I’ll look at the top 10 by position.

sim_lu %>%
  group_by(Name, position) %>%
  dplyr::summarize(lu=n_distinct(lineup)) %>%
  ungroup() %>%
  group_by(position) %>%
  top_n(10, lu) %>%
  ungroup() %>%
  arrange(position, desc(lu)) %>%
  mutate(Name=factor(Name),
         Name=fct_reorder(Name, lu)) %>%
  ggplot(aes(x=Name, y=lu)) +
  geom_bar(stat='identity') +
  facet_wrap(~position, ncol = 3, scales='free') +
  coord_flip() +
  scale_y_continuous(labels = scales::comma) +
  ggtitle('Top 10 Players Present by Position')

From what we see here, Dak Prescott is a solid play for QB, as he shows in in around 5,500 lineups, or around 55% of all optimal lineups. The next most common is Jameis Winston, coming in at around 2,700 lineups, just over a quarter.

Now let’s look at flex configurations. As described above, in DraftKings and most daily fantasy, you have a flex spot, where you can place an extra RB, WR, or TE. Another thing we can learn from these optimal lineups is how often which configurations show up (e.g. how often an RB is selected for the flex position).

sim_lu %>%
  group_by(lineup) %>%
  mutate(lineup_pts=sum(pts_pred)) %>%
  group_by(lineup, position) %>%
  mutate(n=n()) %>%
  select(lineup, position, n, lineup_pts) %>%
  distinct() %>%
  spread(key=position, value=n) %>%
  filter(RB>=2, TE>=1, WR>=3) %>%
  mutate(flex=case_when(RB==3 ~ 'RB',
                        TE==2 ~ 'TE',
                        WR==4 ~ 'WR')) %>%
  group_by(flex) %>%
  dplyr::summarize(pts=median(lineup_pts),
                   cases=n()) %>%
  knitr::kable() %>%
  kable_styling(full_width = FALSE)

flex	pts	cases
RB	160.6642	2325
TE	161.8391	1900
WR	161.6524	5775

For this week, a WR has been put in the flex spot for about 57% of lineups, an RB in about 23% of lineups, and a TE in about 19% of lineups. Just looking at the median lineup points, there isn’t a large difference in scoring between flex configurations.

On that now, let’s look at the distribution of lineup points?

sim_lu %>%
  group_by(lineup) %>%
  dplyr::summarize(points=sum(pts_pred)) %>%
  ggplot(aes(x=points)) +
  geom_histogram() +
  ggtitle('Points Distribution of Optimal Lineups')

Nice, normal looking distribution!

A little too normal looking…

Here it is important to note that the standard deviation of the projection (supplied by ffanalytics) is used to estimate the true projection, not the true performance, that’s why the distribution is so tight and normal looking.

While you CAN enter a lot of lineups in DFS (less so than the old days), we can’t use all these lineups. So how do we narrow down? One way is to look at what’s called that Pareto frontier, lineups where we either maximize total lineup points, or minimize lineup standard deviation. If it sounds a little bit confusing, here’s a visualization to illustrate the point, the bolded points are the pareto lineups.

lu_df <- sim_lu %>%
  group_by(lineup) %>%
  dplyr::summarize(lineup_pts=sum(pts_pred),
                   lineup_sd=sum(sd_pts)) %>%
  ungroup()

pto <- psel(lu_df, low(lineup_sd) * high(lineup_pts))


ggplot(lu_df, aes(y=lineup_pts, x=lineup_sd)) +
  geom_point() +
  geom_point(data=pto, size=5) +
  ylab('Lineup Points') +
  xlab('Lineup Points St Dev') +
  ggtitle('Lineup Points vs Uncertainty',
          subtitle = 'Pareto Lineups Bolded')

This allows us to maximize the tradeoff between lineup points and lineup uncertainty. As expected, there’s a general trend of higher points lineups being more uncertain. We can leverage that uncertainty for certain games. For example, a lineup that has good points and lower uncertainty will be a good cash game play, where the higher uncertainty lineups are good GPP plays, where you want to focus on a lineup’s ceiling rather than their raw projection.

A csv with all 10,000 lineups is located here

Next Steps

While this analysis can improve our lineup building, there are some important limitations.

First, these projections are at the player level, taking no covariance into account. For example, Dak Prescott and Amari Cooper’s simulated scores are calculated separately, in a real life scenario, a better day from Dak probably means a better day for Amari, and a better model should take that into account.

Second, and on a similar note, optimal lineups don’t account for stacking (selecting, say, a QB and WR from the same team, because points for one means points for the other, allowing you to double dip), and one needs to be cognizant of that to both maximize a lineup’s upside, and limit it’s downside (e.g. don’t take a QB and their opposing defense, as their scores are negatively correlated).

These improvements will be coming in future weeks.

Building Optimal Daily Fantasy Lineups in R

Scrape Projections

Build Optimal Lineups

Explore Optimal Lineups

Next Steps

Aaron Miles