Methodology

Creation of the Datasets and source of truth

Any comparative metric is only as good as the datset it is pulling from. RAS is no different, but in developing this metric I had to ensure that there were as few arbitrary choices as possible. Since I combined both pro day and combine information, the question became how to do so without creating confusion or altering data just to alter it. So I ran some tests to see what would happen using the combine as the source of truth against using pro day information to the same. What I mean by source of truth in this case is which information do I use if a player has both measurements. So if a player runs a 4.49 at the combine but a 4.42 at their pro day, which is used to calculate RAS? As a third option, it had been suggested that I use the best of these numbers instead of choosing one or the other. I created datasets of each of these three options and compared the relative changes of the rankings to see what those comparisons would show.

The dataset with Pro Day as source of truth and “Best of” datasets gave virtually identical results, so for simplicity’s sake I threw out the best of dataset and concentrated only on Pro Day vs. Combine.

My initial concerns about using the Combine dataset was that it would unfairly punish players who did not participate in combine drills or that it would give an unfair advantage to players who did both. While reviewing the Pro Day dataset, I saw some slight shifts with players who participated in both with an occasional player who had a significant jump in his overall RAS (More on that another time), so my original concerns were not unfounded. What I found concerning was that players who chose to forego the combine saw their scores reduced almost universally. This created a handicap for players who performed well at the combine, which ran counter to the intention of this metric of accurately representing athleticism on a relative scale.

The Pro Day dataset was not without its use, however, which I’ll cover later while discussing outliers. Ultimately, I made the choice to use the Combine dataset as the source of truth while running RAS. This means that I created a hierarchy of which measurements would be counted which looks something like this:

NFL or Regional Combine -> Officially recorded Pro Day ->Player/Agent/Unverified Source

What measurements are used?

RAS uses ten combine measurements to come up with a final grade. Each measurement is weighted equally, but what each measurement is actually recording finds them bunched together into several different categories.

Size: A player’s size is recorded in five different ways at the combine. These are height, weight, arm length, hand size, and wingspan. Because not every player attends the combine, and because pro day arm length, hand size, and wingspan are rarely recorded and not generally publicly available, I used only height and weight. When arm length and hand size are available, I use them for some composite scoring but they are not included in a player’s final RAS. While not an athletic measurement, size adds some valuable context and helps to weighting the other measurements.

Height: Measured in feet, inches, and eighths of an inch, recorded thus – 6032 is 6 feet, 3 and 1/4 inches. 

Weight: Measured in pounds, combine measurement is used if available while pro day weight is not recorded.

Speed: A measure of a player’s ability to accelerate and maintain their top speed. This is measured in one drill with the other two measures as split of the run as a whole. Many times, especially in data that predates 2006, only one time was recorded and the splits were unavailable.

40 yard dash: Measured in seconds, official combine times are used when available and not best of the two runs.

20 yard split: Measured in seconds, official combine times are used when available and not best of the two runs.

*10 yard split: Measured in seconds, official combine times are used when available and not best of the two runs.

*The 10 yard split is also included for many when discussing a player’s explosion and not simply a measurement of speed.

Strength and Endurance: For two things you figure are pretty important for athletes in a sport, the only measurement the NFL currently uses is the bench press. Often, this is dismissed as a poor measurement of actual play strength, it is more accurately described simply as a measurement of upper body strength and endurance rather than in general.

225 lb. Bench Press: Measured in repetitions.

Explosion: The combine has two measurements that it uses to measure players’ lower body explosion. This trait is exceptionally important in some evaluations since it can reflect a player’s ability to explode off the line of scrimmage, to out leap opponents going up for a ball, or making a hard cut while attempting to create separation.

Vertical: Measured in half inches, official combine vertical is taken as a best of two attempts.

Broad Jump: Measured in feet and whole inches, official combine is taken when available.

Agility: While referred to broadly as agility, the short shuttle and 3 cone drills are used to measure speed and explosion while also measuring hip and ankle flexibility. They are often considered the most important individual drills a player participates in.

5-10-5 Short Shuttle: Measured in seconds, official combine time is taken when available.

L 3 Cone: Measured in seconds, official combine time is taken when available.

How the individual scores are calculated

There are two similar calculations that are used depending on whether it is better to have a higher number (bench, vert, etc) or a lower one (timed events). Using either, a player’s measurements are compared against their position group and assigned a 0 to 10 score based on where they lie. As you can imagine, this roughly corresponds to percentile. I say roughly because most datasets that are presently out only use combine data and do not include pro day, meaning the actual percentile may be slightly different but ought to be close. So why not just use these? What need is there for a composite score, aside from having a single number to refer to? Well that brings us to why this metric as a whole is suited to statistical analysis. You see, every measurement on its own is charted on a bell curve.

Bell Curve

For example, of the nearly 670 WRs with a combine 40 time in my database(at the time of this writing), 450 of them ran between a 4.40 and 4.59, or about 65%. Less than 10% ran faster than that and about 25% ran slower. While not a perfect bell curve, it gives an idea of how the measurements bunch up in the middle and thin out towards the end. For RAS, this manifests with the individual measurements when comparing two players because you can see that a very small difference will show up differently the further from average you are. A player may see a bigger difference with RAS when comparing a 4.51 and 4.52 for instance than you’d see comparing a 4.31 and 4.32. That’s because a player running 4.51 is faster than a lot more people than his 4.52 counterpart while a player who runs a blazing 4.31 is only faster than a few more players than his 4.32 partner. This brings me to where the “Relative” part of Relative Athletic Scores comes in.

How the final score is calculated

If a player has at least six of the ten combine measurements recorded, either at the combine itself or in combination with their pro day, they receive a final RAS. The final score begins as an average of the individual scores. That in itself would be enough from a tracking perspective because it means the final score is no longer on a bell curve, or standard distribution, but is on a nearly straight line, universal distribution.unifprpl.gif

Since the goal is to make it intuitive to understand, the same calculation used to put the other scores on a 0 to 10 scale is used and we get our final RAS. Since it falls in uniform distribution, it is much more reliable when charting trends as there are roughly the same number of players between 0 and 1 as there are between 4 and 5. It’s far easier both in a visual and data sense to show percentage or comparisons since it clears up much of the bunched up information in the middle.