Introduction

The purpose of this report is

to argue for an open system for ranking sports teams,
to review the history of ranking systems, and
to document a particular open method for ranking sports teams against each other.

In order to do this extensive use of mathematics is used, which might make the text more difficult to read, but ensures the method is well documented and reproducible by others, who might want to use it or derive another ranking method from it. The report is, on the other hand, also more detailed than a ''typical'' scientific paper and discusses details, which in a scientific paper intended for publication would be omitted.

We will in this report focus on NCAA 1-A football, but the methods described here are very general and can be applied to most other sports with only minor modifications.

Predictive vs. Earned Ranking Methods

In general most ranking systems fall in one of the following two categories: predictive or earned rankings. The goal of an earned ranking is to rank the teams according to their past performance in the season in order to provide a method for selecting either a champ or a set of teams that should participate in a playoff (or bowl games). The goal of a predictive ranking method, on the other hand, is to provide the best possible prediction of the outcome of a future game between two teams.

In an earned system objective and well publicized criteria should be used to rank the teams, like who won or the score difference or a combination of both. By using well defined criteria for the ranking then teams know exactly what the consequences of a win or at loss will be. This is done in most football conferences to select the conference champion or at least the two teams selected for a championship game. In general an earned ranking system allocates a number (often called the power ranking), which is used the order teams in a linear sequence.

Most systems found on the WWW is predictive, even many of the BCS systems. In order to make a predictive system as accurate as possible it is allowed to include any information, which is deemed useful, like the strength of the quarterback, yards earned, number of fumbles etc. In particular it is very common to put more weight on recent games than older games. This allows a more precise extrapolation the next weeks games. In a more advanced predictive system the teams are not necessarily linearly ordered. One can easily imagine situations, where a good predictive system will predict, that A beats B, B beats C, but C beats A. This is not possible in a pure earned system.

Unfortunately most WWW ranking systems are a strange mix of the two types of systems described here. The BCS ranking, that determines which teams have earned the right to play in the various bowls, in particular the bowl determining the national championship, should be a pure earned ranking system,. But may, if not most, of the BCS systems seem to be predictive. It is even so bad, that a particular web site ranks the BCS systems according to how well they predict next weeks games!

The method described below are all intended as earned systems and now efforts are spent on trying to optimize for predictive capabilities.

Open vs. Closed Methods

Currently the Bowl Championship Series (BCS) system [] is based on 4 components, 1) 2 subjective polls by coaches and journalists, b) 8 computer programs, c) a well defined, but primitive, method for calculating the strength of schedule, and 4) the number of losses of each team.

Here we will be concern ourselves with the 8 BCS computer based methods for ranking the teams.

Billingsley [].
Dunkel [].
Anderson & Hester / Seattle Times []
Massey []
Matthews []
The New York Times []
Rothman [] (public)
Sagarin []

Of all these systems only Rothman offers the code to others. However, Massey has a fairly detailed description of his system so it should be possible to recreate his rankings. Since all the other systems seem to be well guarded secrets it is not possible for others to check the calculations, or try to estimate what effect wins or losses in the next weeks will have. It is especially problematic in the case of Jeff Sagarin, who is making a living of publishing ratings for various sports in USA Today. Since Mr. Sagarin has an economical advantage of keeping his system proprietary, it will be very difficult to obtain a detailed description or the actual source text of his system.

This secrecy hurts the computer rankings tremendously, since they create the impression that they are un-understandable and unfair and rewards ''running up the score''.

It is also a very uncommon situation. In other sports, where rankings play an important role (chess, golf, tennis, etc.) the ranking method is completely public and can be checked by any interested person.

I will therefore suggest, that the current system is replaced with an open system based on publicly accessible code with a detailed mathematical description. Ideally the whole package should be available for download from the BCS WWW site, so coaches, players, journalists and fans can perform the rankings themselves. The criteria the teams should be ranked on should be well published and should be quantifiable in terms of either simple formulas or tables.

For this reason I have chosen to give a detailed report on the ranking system I use. As will be seen, it is based on the work of others. It does, however, contain a few original, minor ideas.

I will also propose, that the NCAA should be responsible for the WWW posting of all NCAA results in a standard ASCII based format, that can be used by everybody ranking sports teams.

An overview of ranking methods

Specialized Tournament Systems

Often a ranking between teams is obtained by letting them play in tournaments with the purpose of either establishing a ranking between them or at least define ''the best team''. This tournament can either be stretched over the whole season or, more commonly, is used in a playoff at the end of the season.

Round Robin System

All teams in the tournament play each other and the ranking is determined by the number of points each team accumulates (see the section on accumulative point systems). This is a very ''fair'' system, but requires a large amount of games.

Cup System

The tournament is played in ''rounds''. Only winners are allowed to play in the next round. Eventually only one team is left and is declared the winner. The theme can be varied by introducing a losers bracket, so a team is only eliminated after two loses. This is a very popular system, since it will find an undisputed ''best'' team using the least amount of games. However, due to the unavoidable fluctuations in performance of each team from game to game it happens very often, that a better teams looses a match to an inferior opponent and is then eliminated. This is not fair from a ranking point of view, but has a lot of appeal from a spectator point of view.

Monrad System

The Monrad system is a very interesting variation of the cup system, which to my knowledge is only used on a regular basis in chess tournaments. In the first round all teams are paired randomly. The winner gets 1 point and the looser zero. In each successive round all teams with the same number of points are paired randomly (except that teams which earlier have played each other can not be paired if there are other pairing possibilities). This system has the advantage, that all teams keep playing, in contrast to the cup system, and as the season (or tournament) advances teams with equal strength will be meeting each other. There are no limitations to the number of rounds that can be played, but eventually teams have to be paired if they have similar, but not necessarily identical, number of points. The team with the largest number of points after a predefined set of rounds is the winner.

This system would be ideally suited for NCAA football, if only tradition and logistics would not interfere. Early in the season teams would play 5-6 Monrad games within their conference and the rest of the season they would be paired nationally, but with a pairing preference for teams geographically close. This would result in some very exciting games in November and would create optimally matched bowl games.

Accumulative Point Systems

Most sports use ranking systems based on the idea, that for each match or tournament the team (or player) acquires a certain number of points depending on their performance and the teams ranking is based on the total number of point they have accumulated during the season. If several teams have the same number of points additional objective criteria are used, like winner of mutual game or accumulated score difference.

In most soccer leagues the winning team gets 2 (or 3) points and the looser none and each team gets 1 point for a tie. If all teams play each other (round robin as described above) this provides a very simple and effective method for evaluating the integrated performance of each team. Typical sizes of leagues range from 6 to 24. Usually the best teams from a league will move up in a higher league next season and the lowest ranked teams will be moved to a lower league. The winner of the highest league will be the overall winner.

Golf and tennis are using systems where each player accumulates points based on their placement in each tournament according to published tables. Prestigious tournaments will provide more points and small local tournament will of course provide less points. Usually the points are accumulated over a sliding time interval of one year.

The advantage of the accumulative point system is their simplicity. It is easy for each team or the spectators to figure out their accumulated score and therefore their ranking. All participants know what they gain or loose by winning or loosing a game. The disadvantage is, that the point allocation, especially in tournaments for large systems like in golf and tennis, becomes somewhat arbitrary and not based on the actual strength of the participants. It is also sometimes possible to rake up a lot of points by carefully selecting weak tournaments or by playing a large amount of tournaments.

Elo Systems

The Elo rating system was first used by the International Chess Federation in 1970 to rank chess players. The system was proposed by Arpad E. Elo. It is partly based on earlier work done by Anton Hoesslinger. The official description of the system as it is used in chess can be found at []. Jones has given a nice overview of the system in .

The basic idea in the system is to continuously change a players rating R_p based on whether she performs better or worse than expected in tournaments or matches. For a new player with a total of N matches, where N Ł N_cut (N_cut = 20) the rating R_p is calculated as

R_p = á R_c ñ +a

N_w-N_l

where á R_c ñ is the arithmetric average of the competitor's ratings at the time of the match, N_w and N_l is the number of wins and losses, respectively, and a = 400 is an initial scale factor. This is the basic Ingo system named after the place of origin, Ingolstadt in Germany, of it's inventor Anton Hoesslinger.

Elo's important improvement to this system was to introduce the Win Expectancy Function W_e, which is defined as

W_e(DR) =

1+10^-[(DR)/(a)]

where

DR = R_p-R_c

For a tournament with M matches played by a player with a rating R_p the new ranking R_p,new after the tournament becomes

R_p,new = R_p+K(S-S_{i = 1}^MW_e,i(DR_i))

where the score is defined as (N_t is the number of tied games)

S = N_w+

N_t

and the sum i runs over each of the games the player played in the tournament. K is in principle a constant, but is in reality varied slightly depending on the rating of the player.

The Elo system seem especially well suited to sports with a large number of participants ( 10,000), where methods based on linear algebra have problems due to memory limitations in computers. However, the idea of introducing the probability function W_e is very powerful and can be used in other ranking system. In particular Massey has used part of these ideas in his very interesting BCS ranking system.

Global Optimization Systems

Ordinal Ranking

Select a ranking that minimizes the number of violations. A violation is a game where a team with a lower ranking defeats a team with a higher ranking.

.......

The Ranking Model

Let us consider a set of teams T consisting of N_T teams playing a total of N_G games between each other. Depending on the nature of the sport the term ''team'' can refer to either an individual (chess, boxing, singles tennis etc.) or a set of individuals (football, baseball, basketball, doubles tennis etc.). We will only consider games consisting of a set of two teams, but the method outlined in this paper can easily be generalized to consider games consisting of n > 2 teams (common in track and field, swimming etc.).

In game g (g = 1,...,N_G) the home team is denoted t_h (h = 1,...,N_T) and the away team is t_a (a = 1,...,N_T). In this game the home team obtains a score of S_h and the away team a score of S_a. The game is played at time T_g within a given season y. Results of games are assumed to be available from a total of N_y seasons (y = 1,...,N_y). The score of the winner is S_w and the score of the looser is S_l. We will assume for simplicity that the winner of a game is the team with the larger score. The margin of victory or point spread DS can be defined in two different ways.

DS_ha = S_h-S_a

DS_wl = S_w-S_l

DS_wl will always be non-negative, whereas DS_ha will be positive if the home team wins and negative if the away team wins. The relation between them is

DS_ha =

ě
ď
í
ď
î

DS_wl

if S_h ł S_a

-DS_wl

if S_h < S_a

Basic Model Assumptions

The assumptions of the current ranking model are:

Only games played between two teams t_h and t_a in the set T are considered (t_h,t_a Î T).
Only games played within a given season y are considered, except for rankings performed early in the season, where results of games from season y-1 can be used.
The outcome or result R_g(S_h,S_a) of game g is a real function depending only on the final scores S_h and S_a.
The result R_g does not depend on the time of the game T_g within the season nor on any other variable related to the game.
The ranking of the teams in the set will be accomplished by allocating the i'th team t_i a strength or power rating r_i,where {r_i,i = 1,...,N_T} is a set of real numbers . The teams will then be ranked (= ordered) according to the value of their strength.
The result R_g of game g is a measurement of the strengths r_h and r_a of the two teams with an associated measurement error s_g.

In the following the discussion will be based on examples from football, but the formalism is completely general and could be used for any binary game resulting in a final set of scores. It follows from assumptions 4, that the current ranking method does not take into account other ''unofficial'' statistics from a game, like half time score, yardage gained or lost, fumbles etc. While these variables might very well be of importance for a prediction algorithm, we consider them irrelevant and unfair to use for a ranking algorithm, which purpose is to evaluate which team is the best or which set of teams are the best. In this latter case the teams need to know exactly what they are being evaluated on.

The Game Outcome Function

Let us now consider in more detail the result R_g of game g. As stated in assumption 3 the result R_g(S_h,S_a) is a real function of the two scores S_h and S_a. R_g will be considered a measurement of the strength of the two teams. As usual with any physical measurement the result R_g does not provide an exact measurement of the strengths of the two team, but an uncertainty s_g is associated with each measurement g. There is, unfortunately, not a universally accepted result function. Some commonly used functions are discussed below.

Win-Loss system (WL)

It only matters which team wins (= which team has the higher score), but the actual scores do not matter.

R_g^WL(S_h,S_a) =

ě
ď
ď
í
ď
ď
î

1 if S_h > S_a

0 if S_h = S_a

-1 if S_h < S_a

This system is often referred to as the JWB system (Just Win Baby). Effectively this is how many fans view the game outcome, since the only thing that matters is whether you win or not. Not how you win or by how much.

Score Difference system (SD)

The result of the game is defined as the difference between the scores of the two teams:

R_g^SD(S_h,S_a) = S_h-S_a = DS_ha

In football this system is often referred to as the BOMB Index (Bowden - Osborne Memorial Blowout Index), since it is perceived, that Florida State and Nebraska used to run up the score against weak opponents, which will help their ranking in this system.

Truncated Score Difference system (TSD)

A modification the SD system, where the score difference is truncated at some value DS_max in order to avoid to heavy an emphasis on games, where one team ''runs up'' the score:

R_g^TSD(S_h,S_a|DS_max) ş DS_t ş

ě
ď
ď
í
ď
ď
î

DS_max if DS_ha ł DS_max

DS_ha if | DS_ha| < DS_max

-DS_max if DS_ha Ł -DS_max

In football typical values of maximum point spread DS_max ranges from 21-35. DS_t is called the truncated point spread. Please note that the game outcome now also depends on the game-independent parameter DS_max.

A mathematically more elegant way of providing this cutoff is to use the hyperbolic tangent function

R_g^TSDT(S_h,S_a|DS_max) = DS_maxtanh

ć
ç
č

DS_ha

DS_max

ö
÷
ř

with

R_g^TSDT @ DS_ha for | DS_ha| << DS_max

and

R_g^TSDT®DS_max for DS_ha®Ą

Simple Hybrid WL-SD system

Since both the pure WL and SD system tend to favor a particular, but maybe extreme, view of the game outcome, other systems have introduced a simple linear combination of the two systems:

R_g^WLSD(S_h,S_a|B_w) =

ě
ď
ď
í
ď
ď
î

B_w+DS_ha

if S_h > S_a

if S_h = S_a

-B_w+DS_ha

if S_h < S_a

In football typical values for the ''bonus'' B_w for a win is 50-100. For B_w = 0 the system reduces to the SD system and for B_w >> |DS| it is identical to the WL system.

Score Ratio system (SR)

Instead of forming the difference between the score, as in the SD system, the ratio between the scores is the game outcome.

R_g^SR(S_h,S_a) =

DS_ha

S_w

In this case -1 Ł R_g^SR Ł 1, with | R_g^SR| = 1 if the loosing team does not score any points (a shot-out). The rare case of S_h = S_a = 0 is not defined in this system and it can therefore not be used in sports, where this result is possible.

Linear Win - Difference - Ratio system (LWDR)

All the above mentioned methods for evaluating the outcome of a game have their virtues. It is therefore natural to form a outcome function, that combines all of them. The simplest way of doing this is by forming a linear combination of the three types of outcome:

R_g^LWDR(S_h,S_a|B_w,DS_max,B_r) =

ě
ď
ď
ď
í
ď
ď
ď
î

B_w+DS_t +B_r

DS_ha

S_w

if S_h > S_a

if S_h = S_a

-B_w+DS_t+B_r

DS_ha

S_w

if S_h < S_a

There are three parameters in this approach: 1) the win bonus B_w, 2) the maximum point spread DS_max, and 3) the scoring ratio weight factor B_r. They are, however, not independent since a scaling of all of them will lead to the same ranking. Since the sum of the three parameters is equal to the maximum value of the game outcome function R_g^max, it is convenient to constrain the three parameters by choosing R_g^max to have a convenient value like 100.

B_w+DS_max+B_r = R_max

In this paper we will choose the following values

B_w = 50 DS_max = 25 B_r = 25

A subsequent paper will discuss techniques for choosing the most optimal values for (B_w,DS_max,B_r).

The Outcome Prediction Function

The outcome prediction function P_g(r_h,r_a|a_j) estimates the outcome of a game g between the two teams t_h and t_a based on their strength rating r_h and r_a, respectively. In addition P can depend on a number of additional parameters, depending on the choice of game outcome function, R_g. In the case the WDR game outcome function R_g^WDR is used the parameter vector is [(a)\vec] = (B_w,DS_max,B_r). In addition other parameters, like the home field advantage B_h, can be added to the parameter set, if needed. Actual choices of P will be discussed later in this paper, but first we will outline the method for determining the strength ratings, r_i, independent of the functional expression of P.

During a season with N_g games a total of N_g measurements of the N_t strength ratings r_i is performed and we want to determine the vector [r\vec] so the difference between the game result R_g and the prediction (or hypothesis) P_g is as small as possible for as many games as possible. This can be done in a number of ways. We choose to minimize the sum of the square of the difference between the game result and the prediction. This is the so-called Least Squares Method.

minimize: c²([r\vec]) = ĺ_{g = 1}^N_g[ [(R_g([(a)\vec])-P_g([r\vec] | [(a)\vec]))/(s_g)]] ²

The measurement error s_g for each game will be discussed later.

Other methods are based on the maximum norm ......

Linear chi-square method

If we furthermore assume, that the outcome prediction function depends linearly on the strength ratings, r_i, we can use the general framework for linear least squares fits. This is a very strong assumption and a later paper will discuss non-linear approaches. We will furthermore assume P_g does not depend on [(a)\vec], but only depends on the relative difference between the strength ratings of the two teams participating in the game

P_g = r_h-r_a =

N_t
ĺ
t = 1

d_gtr_t where d_gt =

ě
ď
ď
í
ď
ď
î

if t = h

-1

if t = a

if t ą h,a

This reduces the problem to the minimization of

c²(

®
r

) =

N_g
ĺ
g = 1

é
ę
ë

R_g([(a)\vec])-ĺ_{t = 1}^N_td_gtr_t

s_g

ů
ú
ű

Following the standard c² nomenclature we introduce the design matrix A with elements

A_gt =

d_gt

s_g

the weighted result vector b with elements

b_g =

R_g

s_g

and the strength parameter vector r. In matrix notation the problem can now be written as

c² = | A·r-b| ²

where the | | ² symbol indicates the Euclidean norm in the vector space spanned by all the games.

Single Value Decomposition Solution

Using the Single Value Decomposition algorithm the vector r, that minimizes c² is

r_t =

N_g
ĺ
g = 1

ć
ç
č

U_tg b_g

v_g

ö
÷
ř

V_tg

where U, V are the SVD matrices and v is the single value vector defined as

A_gt =

Nt
ĺ
i = 1

v_gU_giV_ti

A = U·[ diag(v_g)] ·V^T

We obtain the U and V matrices and the v vector using the routines described in Press et al. [1992].

The advantage of using the linear c² method is speed, since it only takes a few seconds on a normal PC to solve for the rankings. The SVD algorithm is the standard tool for solving linear c² problems due to its robustness. For further details see press et al. [1992].

The Game Weight Factors

The game weight factors, defined as

w_g =

s_g²

provide the option to let various games influence the rankings differently. It is very common in other ranking models to make the weights time-dependent

w_g(t) µ exp(T-t_g)

where T is the time where the ranking is performed. This will put more emphasis on games played recently. This method is, however, more appropriate for prediction model than for ranking models. The current ranking model does not incorporate any time-dependence in the weights, with the exception of the initial period as explained later.

If two teams of very similar strength are playing each other, we consider the outcome of this game a ''good'' measurement of the relative strengths of the two teams. In contrast we consider mis-matched games between opponents with very different strengths as ''bad'' measurements. In mathematical terms this implies, that we will associate a larger weight w_g (or equivalently a smaller uncertainty s_g) to games between teams where the difference between the rankings r_h and r_a is small. This can, however, only be done in an iterative procedure, since the rankings r_h and r_a are not initially. We are therefore using the following method:

Initial values of the ranking vector r⁽¹⁾ are determined by assuming w_g⁽¹⁾ = 1. Afterwards the c² system is solved again, but this time with the following weights

w_g⁽²⁾ = exp

ć
ç
č

| r_h-r_a|

a_w

ö
÷
ř

where a_w is determined from the condition

exp

ć
ç
č

max{ r_t -min{ r_t

a_w

ö
÷
ř

b_w²

In other words, we consider a game between the best and the worst team to have an measurement error b_w times larger than when two completely equal teams play. b_w is a free parameter in this ranking model. The numerical examples show later have used the value b_w = 3.

Early season rankings

Early in the season, when only few games have been played, the design matrix A has a rank smaller than N_t. This has the effect, that the relative ranking of many sub-sets of teams are undetermined. For NCAA 1-A division football teams, where N_t = 112, the number of independent sub-sets of teams as function of number of the number of weeks played is shown in the table below (based on the 1996-98 seasons):

Number of independent sets
week 1996 1997 1998
1 109 104 104
2 78 78 65
3 36 36 18
4 3 2 1
5 1 1 1

Only after 5 playing weeks will it be possible to obtain a solution, where all teams are ''connected''. For NCAA 1-A football the onset of full connectiveness seems empirically to coincide with a condition, that the average number of games per team is 2.5 - 3.0 or a total of 139 to 170 games between the 112 teams (only games where both teams are division 1-A counts).

It is, however, often required to obtain a ranking early in the season before the full connectedness haven been obtained. Since we are requiring, that only the final results of games can be used in the ranking this can only be obtained by using game results from the previous season(s). So early in the season results from the previous season are also included in the ranking, but with a decreasing weight factor. The total weight factor on each game is

w_tot = w_gw_s

where w_g is defined above and

w_s = {

1

for games in the current season
max[ 0,( 1-[( á N_gt ñ )/(R_cut)]) ² ]

for games in theprevious season

where á N_gt ñ is the average number of games played by each team and R_cut = 4.5 represents a cut-off value for the inclusion of previous seasons games.

Input

Currently it is cumbersome to obtain NCAA results, especially for the lower divisions. I would therefore like to propose, that NCAA post results of all NCAA games (football, basketball, tennis, etc.) in a standard ASCII based format, that can be used directly a input to ranking programs.

For each sport two files should be created for each season: a team file and a game file.

Team File

The Team file should contain the following information on each team:

Team index. A unique integer index running from 0
Official Team Name. (Tennessee, James Madison, etc.)
Conference Name. (The Southeastern Conference, The Atlantic Ten Conference, etc.)
Division. (1-A, 1-AA, etc.)

The format of the file should be ( generic C printf statement ):

fprintf( TeamFile, '' %5d %-25s %-35s %-25sn'',

Index, Name, Conference, Division );

Game File

The Game file should contain the following information on each game:

Date
Away team name
Away team score
Home team name
Home team score
Away team index
Home team index
Playing field code ( = 1 if home team was home (default), = 2 if the game was a bowl or play-off game, and = 3 if game for some other reason was played on a neutral field)

The format of the file should be ( generic C printf statement ):

fprintf( GameFile,

'' %4d %2d %2d %-25s %3d - %-25s %3d %5d %5d %1dn''

Year, Month, Day,

AwayTeamName, AwayTeamScore,

HomeTeamName, HomeTeamScore,

AwayTeamIndex, HomeTeamIndex, FieldCode );

Summary

References

[]: The official web site for the BCS is: http://www.abccfb.com/. The current rules for the BCS are documented at http://www.cae.wisc.edu/dwilson/rsfc/rate/newbcsrelease.html
[]: The description of the Billingsley system can be found at: http://www.cfrc.com/html/searchof.htm
[]: No current, active links to a description of the Dunkel system could be found.
[]: The description of the Anderson & Hester / Seattle Times system can be found at: http://ww1.sportsline.com/u/football/college/polls/seattletimes/
[]: http://www.mratings.com/theory/massey.htm
[]: http://www.expertpicks.com/
[]
[]: http://www.cae.wisc.edu/dwilson/rsfc/rate/rothman.html
[]: http://www.usatoday.com/sports/football/sfc/sfcjsx.htm
[]: http://www.uschess.org/ratings/info/system.html
[]: http://www.brainiac.com/royjones/

==============================================
Created Sun Nov 07 11:13:54 1999 by Soren Sorensen
==============================================


Number of independent sets
week	1996	1997	1998
1	109	104	104
2	78	78	65
3	36	36	18
4	3	2	1
5	1	1	1