# User Manual¶

OpenSkill requires knowledge of some domain specific jargon to navigate. If you know what the central measures of tendency and Gaussian distributions are, you are pretty much set.

If you don’t know what those are, please consider using a short resource on statistics to get acquainted with the terms. We recommend Khan Academy’s short course on statistics and probability.

If you’re struggling with any of the concepts, please search the discussions section to see if your question has already been answered. If you can’t find an answer, please open a new discussion and we’ll try to help you out. You can also get help from the official Discord Server. If you have a feature request, or want to report a bug please create a new issue if one already doesn’t exist.

Let’s start with a short refresher:

## Arithmetic Mean¶

The arithmetic mean is the sum of all values divided by the number of values. It is the most common type of average.

We will denote \(\bar{x}\) as the symbol \(\mu\) or henceforth written in code as `mu`

.

Every player has a `mu`

value. This is the average skill of the player as determined by OpenSkill models.
On the other hand, presumably there is an actual amount of skill that a player has. This skill value obviously
fluctuates due to many factors. Our goal is to estimate this value as accurately as possible.

But how can we be certain about whether the skill the model has given a player is accurate?

## Standard Deviation¶

The standard deviation is a measure of how spread out numbers are. It is the square root of the variance. The variance is the average of the squared differences from the mean.

We will denote \(\sigma\) written in code as `sigma`

. This is the uncertainty the model has about the player’s skill.
The higher the uncertainty, the less certain the model is about the player’s skill.
For instance if a player has a `mu`

of `25`

and a `sigma`

of `8`

, then the model is 95% certain that the player’s skill is between `9`

and `41`

.

## The Basics¶

First we have to initialize the mode we want to use. We will use the `PlackettLuce`

model for this example.
The `PlackettLuce`

model is generally a good choice if you’re not expecting large matches with lots of players and teams.
On the whole, all the 5 models have the same capabilities, but some are more accurate and some are faster than others. It’s up to
you to test out what works for you.

Let’s start the example by importing the model from the package. All models live
under the `openskill.models`

module.

```
from openskill.models import PlackettLuce
model = PlackettLuce()
```

Every model comes with a set of parameters that can be set. These parameters are used to configure the model to your liking. The parameters for all the Weng-Lin models are somewhat similar.

The default parameters for the `PlackettLuce`

model are found here, but you are free to use anything
in so far as you don’t violate the rules of the underlying assumptions of the model. The important and relevant parameters
here are `mu`

and `sigma`

with \(25\) and \(25/3\) as their default values respectively.

You can ofcourse shift the values higher or lower. But it doesn’t make sense to violate the rule that `sigma`

should be
\(\frac{1}{3}\) of `mu`

. If you do, the model may not work as intended. If you know what you’re doing and are a
statistics expert, you can change the parameters to your liking. But if you’re not, we recommend you stick to the default values.

Finally, there is a `balance`

flag you can set to `True`

if you want the rating system to modify it’s
assumptions about users on the tail ends of the skill distribution. With `balance`

turned on, the higher the rating
a player has, it’s assumed it’s a much more monumental achievement. The inverse is true for lower rated players. We won’t
enable this feature for our purposes.

Let’s now get the object representing a single player by calling the `rating`

method
on the model. This method returns a `PlackettLuceRating`

object for which you can set your own
values. Since we are using the default values, each player will also start with those values. We can also set a optional name.
It can anything, an ID, a username, anything. It’s just a way to identify the player.

```
p1 = model.rating(name='john123')
print(p1)
```

This will print out the following:

```
Plackett-Luce Player Data:
id: 58d990abafd44559bb5f63882c1456dc
name: john123
mu: 25.0
sigma: 8.333333333333334
```

Notice, how a `uuid.uuid4()`

is generated for the player. This is a unique identifier for the player.
You can use a regular `filter()`

to get the player back from the model.

Let’s generate a few more players.

```
p2 = model.rating(name='jane234')
p3 = model.rating(name='joe546')
p4 = model.rating(name='jill678')
```

Now let’s organize them into teams. Team are represented by regular python lists.

```
team1 = [p1, p2]
team2 = [p3, p4]
```

Now let’s create a match and rate them using our model. The first team is the winner.

```
match = [team1, team2]
[team1, team2] = model.rate(match)
[p1, p2] = team1
[p3, p4] = team2
```

Let’s print all the player’s values to see what’s changed.

```
p1: mu=26.964294621803063, sigma=8.177962604389991
p2: mu=26.964294621803063, sigma=8.177962604389991
p3: mu=23.035705378196937, sigma=8.177962604389991
p4: mu=23.035705378196937, sigma=8.177962604389991
```

As you may have noticed, the winning team has a higher `mu`

value than the losing team and the `sigma`

values
of all the players have decreased. This is because the model is more certain about the skill of the players after the match.

More often than not you’ll want to store at least the `mu`

and `sigma`

values of the players in a database.
This means if you want to conduct another match, you’ll have to load the players back from the database. We have a helper
method to create a player from a list of `mu`

and `sigma`

values. Just call the model’s `create_rating`

method.

```
p1 = model.create_rating([23.035705378196937, 8.177962604389991], "jill678")
```

Warning

Do not store the `uuid.uuid4()`

in a database. It is only useful for the lifetime of the program.
If you want to use a unique identifier to store in the database, use the `name`

parameter instead.

## Ranks¶

When displaying a rating, or sorting a list of ratings, you can use `PlackettLuceRating.ordinal()`

.

```
print(p1.ordinal())
print(p3.ordinal())
```

Which will print out the following:

```
2.4304068086330872
-1.4981824349730388
```

By default, this returns \(\mu - 3\sigma\), showing a rating for which there’s a 99.7% likelihood the player’s true
rating is higher, so with early games, a player’s ordinal rating will usually go up and could go up even if that
player loses. If you want to prevent that you can pass the `limit_sigma`

boolean parameter to the model defaults
or the `PlackettLuce.rate()`

method.

## Artificial Ranks¶

If your teams are listed in one order but your ranking is in a different order, for convenience you can specify a ranks option, such as:

```
ranks = [4, 1, 3, 2]
[[p1], [p2], [p3], [p4]] = model.rate(match, ranks=ranks)
```

It’s assumed that the lower ranks are better (wins), while higher ranks are worse (losses). You can provide a score instead, where lower is worse and higher is better. These can just be raw scores from the game, if you want.

Ties should have either equivalent rank or score:

```
scores = [37, 19, 37, 42]
[[p1], [p2], [p3], [p4]] = model.rate(match, scores=scores)
```

## Weights¶

For faster convergence of ratings, you can use pass the `weights`

argument to `PlackettLuce.rate()`

method.
The `weights`

argument takes raw numeric values for each player from at the end of a match. These values should only
represent metrics that **always** contribute to a win condition in the match. For instance, in large scale open battle
arena games, there is a time limit for the entire game. In such games, a player can still win with very low points or kills.
Always make sure the metric you choose in your game is something that significantly contributes to winning the match.

```
weights = [[20], [1], [3], [15]]
[[p1], [p2], [p3], [p4]] = model.rate(match, weights=weights)
```

## Matchmaking¶

These models wouldn’t be very useful, if you couldn’t predict and match up players and teams. So we have 3 methods to help you do that.

### Predicting Winners¶

You can compare two or more teams to get the probabilities of each team winning.

```
p1 = model.rating()
p2 = model.rating(mu=33.564, sigma=1.123)
predictions = model.predict_win([[p1], [p2]])
print(predictions)
print(sum(predictions))
```

Let’s see what this outputs:

```
[0.2021226121041832, 0.7978773878958167]
1.0
```

As you can see the team with the higher `mu`

and lower `sigma`

has a higher probability of winning.
The sum of the probabilities is \(1.0\) as expected.

### Predicting Draws¶

You can also predict the probability of a draw between two teams. This behaves more like a match quality metric. The higher the probability of a draw, the more likely the teams are to be evenly matched.

```
p1 = model.rating(mu=35, sigma=1.0)
p2 = model.rating(mu=35, sigma=1.0)
p3 = model.rating(mu=35, sigma=1.0)
p4 = model.rating(mu=35, sigma=1.0)
p5 = model.rating(mu=35, sigma=1.0)
team1 = [p1, p2]
team2 = [p3, p4, p5]
predictions = model.predict_draw([team1, team2])
print(predictions)
```

Let’s see what this outputs:

```
0.0002807397636510
```

Odd, we have almost no chance for a draw. This is because the more teams we have the possibilities for draws decrease due to match dynamics. Let’s try with 2 teams and fewer players.

```
p1 = model.rating(mu=35, sigma=1.0)
p2 = model.rating(mu=35, sigma=1.1)
team1 = [p1]
team2 = [p2]
predictions = model.predict_draw([team1, team2])
print(predictions)
```

Okay let’s see what changed:

```
0.4868868769871696
```

A much higher draw probability! So keep in mind that the more teams you have, the lower the probability of a draw and you should account for that in your matchmaking service.

Note

Draw probabilities will never exceed 0.5 since there is always some uncertainty.

### Predicting Ranks¶

We can go even more fine grained and predict the ranks of the teams. This is useful if you want to match the lowest ranked teams with the highest ranked teams allowing you to quickly eliminate weaker players from quickly from a tournament.

```
p1 = model.rating(mu=34, sigma=0.25)
p2 = model.rating(mu=34, sigma=0.25)
p3 = model.rating(mu=34, sigma=0.25)
p4 = model.rating(mu=32, sigma=0.5)
p5 = model.rating(mu=32, sigma=0.5)
p6 = model.rating(mu=32, sigma=0.5)
p7 = model.rating(mu=30, sigma=1)
p8 = model.rating(mu=30, sigma=1)
p9 = model.rating(mu=30, sigma=1)
team1, team2, team3 = [p1, p2, p3], [p4, p5, p6], [p7, p8, p9]
rank_predictions = model.predict_rank([team1, team2, team3])
print(rank_predictions)
```

It will produce the rank and the likelihood of that rank for each team:

```
[(1, 0.5043035277836156), (2, 0.3328317993957732), (3, 0.16286467282061112)]
```

Warning

The sum of the probabilities of the ranks and the draw probability no longer equal 1.0 from `v6`

onwards.

## Picking Models¶

The models are all very similar, but some are more efficient and more accurate depending up on the specific use case.

There are currently 5 models:

`Part`

stands for partial paring and is a reference to how ratings are calculated underneath the hood. Suffice to say
the partial pairing models are more efficient, but less accurate than the full pairing models. The `PlackettLuce`

model is a good balance between efficiency and accuracy and is the recommended model for most use cases.