R Basics

9. Random Numbers & xG

Creating random numbers is a central part of programming – whether it is for simulations, games or models, there are a multitude of uses for random numbers. R’s makes creating pseudo-random numbers incredibly simple. This article runs through the basic commands to create a random number, then looks to apply this to finding how likely you are to win based on a match’s expected goals or 'xG' (we’ll come on to what these are later).

The easiest way to get a random number is through the ‘runif()’ operation. This will give a random number between 0 and 1. Check it out:

          
      runif(1)

      ## [1] 0.8614516
          
  

A number between 0 and 1 is very useful. Essentially, it gives us a percentage that we can use to calculate chance, or to assign to a variable for calculation.

Additionally, we can use it to create a random whole number by multiplying it by the maximum possible value, then rounding with to the nearest integer using round(). The example here gives us a random number between 0 and 100:

        
    round(runif(1) * 100)

    ## [1] 0
        

Alternatively to the above, we could use another function sample() to create a random whole number for us. We simply pass the highest and lowest possible values (inclusive) that we will allow and state the amount of times to run the generation. Let’s simulate a 6-sided dice roll:

        
    sample(1:6, size = 1, replace = TRUE)

    ## [1] 6
        

Applying Random to Expected Goals


Great job on generating random numbers. The rest of the article applies it to expected goals, and will allow us to calculate how ‘lucky’ a team was based on the quality of their shots.

Firstly, expected goals is a measurement of how many goals a team could have expected to score based on the shots that they took. Different models base this on different things, but most commonly, the location of the shot, type of build-up, foot used are used to compare the chance with similar ones historically. We can then see the percentage chance of the shot becoming a goal.

Knowing the expected goals, we can generate random numbers to test how likely that score was. Let’s set up our lists of shots with their expected goal values – these are all percentages represented as decimals.

          
    HomexG = c(0.21,0.66,0.1,0.14,0.01)
    AwayxG = c(0.04,0.06,0.01,0.04,0.06,0.12,0.01,0.06)
          
  

The first shot for the home team had a 21% chance of being scored. Let’s create a random percentage to simulate if it goes in or not. If it is equal or less than 21%, we can say that it is scored in our simulation:

        
    if (runif(1) <= 0.21){
      print("GOAL!")
    }else{
      print("Missed!")
    }

    ## [1] "Missed!"
        

As happens roughly 4 out of 5 times, this time the shot was missed. Let’s run the shot 10,000 times:

        
    Goals <- 0

    for (i in 1:10000){
      if (runif(1) <= 0.21){
      Goals <- Goals + 1
    }
    }
    print(Goals)

    ## [1] 2108
        

So according to the xG score and our random test, if we take that shot 10,000 times, we can expect around 2075 goals (pretty much in line with the 0.21 score). In a nutshell, this is how we simulate with random numbers.

Going Further: Simulating a Match with Expected Goals


Rather than simulate with one number, let’s apply this same test to every shot by the home and away teams. Take a look through the function below and try to understand how it applies the test above to every shot in the HomexG and AwayxG lists.

          
    calculateWinner <- function(home, away){
        #Our match starts at 0-0
        HomeGoals = 0
        AwayGoals = 0

        #We have a function within our function
        #This one runs the 'runif(1)' test above for a list
        testShots <- function(shots){

            #Start goal count at 0
            Goals = 0

            #For each shot, if it goes in, add a goal
            for (shot in shots){
                if (runif(1)<=shot){
                    Goals <- Goals + 1
                }
            }

            #Finally, return the number of goals
            return(Goals)

        }

        #Run the above formula for home and away lists
        HomeGoals = testShots(home)
        AwayGoals = testShots(away)

        #Return the score
        if(HomeGoals > AwayGoals){
          print(paste0("Home Wins! ",HomeGoals, " - ",AwayGoals))
        } else if(AwayGoals > HomeGoals){
          print(paste0("Away Wins! ",HomeGoals, " - ",AwayGoals))
        }else{
           print(paste0("Share of the points! ",HomeGoals, " - ",AwayGoals))
        }

    }

    calculateWinner(HomexG, AwayxG)

    ## [1] "Away Wins! 1 - 2"
          
  

We are now simulating a whole game based on expected goals, pretty cool!

However, we are only simulating once. In order to get a proper estimate as to how likely it is that one team wins, we need to do this lots of times.

Let’s change our last function to simply return the result, not a user-friendly print out. We can then use this function over and over to calculate an accurate percentage chance of winning for each team.




    calculateWinner <- function(home, away){
        #Our match starts at 0-0
        HomeGoals = 0
        AwayGoals = 0

        #We have a function within our function
        #This one runs the 'runif(1)' test above for a list
        testShots <- function(shots){

            #Start goal count at 0
            Goals = 0

            #For each shot, if it goes in, add a goal
            for (shot in shots){
                if (runif(1)<=shot){
                    Goals <- Goals + 1
                }
            }

            #Finally, return the number of goals
            return(Goals)

        }

        #Run the above formula for home and away lists
        HomeGoals = testShots(home)
        AwayGoals = testShots(away)

        #Return the score
        if(HomeGoals > AwayGoals){
          return("home")
        } else if(AwayGoals > HomeGoals){
          return("away")
        }else{
          return("draw")
        }

    }

    #Run xG calculator 10000 times to test winner %
    calculateChance <- function(team1, team2){
        home <- 0
        away <- 0
        draw <- 0

        for (i in 1:10000) {

          matchWinner = calculateWinner(team1,team2)

        if(matchWinner == "home"){
          home <- home + 1
        }else if(matchWinner == "away"){
          away <- away + 1
        }else{
          draw <- draw + 1
        }
        }

        home = home/100
        away = away/100
        draw = draw/100

        print(paste0("Over 10000 games, home wins ", home, "%, away wins ",away,"% and there is a draw in ",draw,"% of games."))

    }

    calculateChance(HomexG, AwayxG)

    ## [1] "Over 10000 games, home wins 60.5%, away wins 9.78% and there is a draw in 29.72% of games."


There we go! We now have a better understanding as to what result we could normally expect from these chances!

Let’s try a new run of expected goals – one great chance (50%) against 10 poor chances (5%). Who wins most often here?

  
    HomexG <- c(0.5)
    AwayxG <- c(0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05,0.05)
    calculateChance(HomexG, AwayxG)

    ## [1] "Over 10000 games, home wins 28.97%, away wins 24.72% and there is a draw in 46.31% of games."
  

Interestingly, the big chance team has a 5% advantage over the team that shoots loads from low-chance opportunities. Makes you think!

Summary


Creating random numbers in R is easy, whether we want a random percentage or number between 0 and 1, or we want a random whole integer.

In this article, we saw how we can apply random numbers to a simulation. If anything around the function creation or for loops was confusing here, you might want to take a read up on those.

Next Lessons...


  1. Numbers, Variables & Strings
  2. Comparisons & Logic
  3. If Statements
  4. For Loops
  5. While Loops
  6. Lists
  7. Packages
  8. Creating Functions
  9. Random Numbers with Expected Goals

This is a R conversion of a tutorial by FC Python. I take no credit for the idea and have their blessing to make this conversion. All text is a direct copy unless changes were relevant. Please follow them on twitter and if you have a desire to learn Python then they are a fantastic resource!