Friday, April 4, 2014

Hypotheses and The Chase for My Heart

Hello Internet! Guido here with another Scoring Indy update. Now, as anyone reading this blog the day it goes up probably knows, there is not an IndyCar race this weekend. This leaves Mr. Numbers here in a conundrum. What is a numbers guy to post about when there's no race to look forward to, and not much in the way of data from the season to look back on? Well, I put it out there on Twitter, and the man, whom I consider to be my original reader: Eric Hall came through for me. So, this week I'll be positing some "future statistical hypotheses based on historic trends and the current data set." (His words, not mine.) Basically, I'll be making predictions, and I hope they'll be better than my usual race predictions. Plus, I'll be revealing my overly-complicated scheme to pick a new favorite driver before the Indy 500.

Let's get to it!


Gigantic Disclaimer

Now, when he tweeted me about wanting these gazes into my numerical crystal ball, I'm not sure that Eric fully grasped (though he might) the way that the numbers I create work. Now, before I try to tell you all how the numbers that I create work, I need to let you all know something about me. The last time that I took a math class was in high school. In college, I studies Religious Studies, Classics, History, and Economics. Now, in Econ, I had to use a lot of math, but the last time that I took a math class just to learn math was over ten years ago now. I am also pretty much self-taught in the area of statistics. So take that all for what it's worth when I tell you what I'm about to tell you.

Still with me? Ok. The numbers that I generate and present to the whole wide internet on a regular basis are what are known as "descriptive statistics." The Wikipedia article telling you what that means is here. Basically, I'm aiming to describe to all of you what happened in the race, based on the various factors of starting position, positions gained, finishing position, laps led, laps completed, and average running position.

My numbers, at least as far as I can tell, don't have a ton of predictive qualities. Having a good Race Score one week, doesn't necessarily mean that a driver will duplicate that result the next week, or at the next similar track, or the next year at the same track. Now, as I do this for several years (God willing and the creek don't rise) we all may be able to examine trends in Race Scores across drivers, tracks, situations, et cetera. But, for now, I just don't have the stockpile of data that I would need to make those assumptions.


2014 Hypotheses

Now, I'm not one to let not having "the stockpile of data that I would need to make those assumptions" get in my way. Eric asked some questions that I think I can address with the data that I have, and the questions that I can't, I'll take a swing in the dark on. So, for a bit here, I'm going to look forward to the rest of the season by answering Eric's specific questions. (Good things happen when you interact with me on Twitter.)

- Who is this Year's Conway Line?

For those of you not in the know on that little bit of jargon, the "Conway Line" is a fictional level of performance, named after Mike Conway. You can read more about it here, but the basic premise is that the Conway Line is set at earning 20% of maximum Race Score World Championship points. Last year, that meant if a driver earned more 5 points per race (which would amount to finishing with the 7th best or better Race Score) he/she would be "above the Conway Line."

This year, with the change to the points structure, a driver would need 2 points per weekend to finish directly on the Conway Line. This means finishing exactly 7th would do it for you. This makes being "above the line, slightly more difficult.

To the question, however, I wouldn't be surprised to see a driver like Takuma Sato (who flirted with the Conway Line before his late season slump) or Graham Rahal (who surprised me at St. Pete) finish the year right around 20% of max points. I won't be renaming it the "Sato Line" or the "Rahal Line," however. I like the way "Conway Line" sounds too much.

- Highest Score of the Year?

This, I think, stems from a little back and forth that Eric and I had the other day about what the highest Race Score possible is. Now, I won't bore you with all the details, but to say it as succinctly as possible to get the maximum possible score, a driver would have to do two things:

1) Start last on the grid.
2) Lead ever lap.

That means that driver has to pass ALL the other cars on lap 1. This would assure that his/her average running position is 1.00, and that he/she will lead and complete all the laps that are run.

Now, this number will increase based on every car in the race. After all, it's a more impressive achievement to pass 21 other cars on the first lap, than it is to pass only 6. As an example, the maximum score for a race with 22 cars, like St. Pete is 131.82. Increase the number of entries to 25 and the max is 132.00.

Anyway, I find it highly unlikely that any driver would ever achieve a maximum score, though it is possible. As I told Eric, a much better benchmark is to know that leading every lap from the pole will earn a driver a Race Score of 100.00. The last time that this happened was Scott Dixon's dominant drive at Belle Isle in 2012. The highest score that any driver earned in 2013 was 98.57, turned in by James Hinchcliffe at Iowa. I'd say it's unlikely that anyone repeats Dixon's 100 pointer from Belle Isle '12 (I'm willing to say that if the track hadn't been coming apart, Dixon himself wouldn't have done it). So, if you put my toes to the fire, I'd guess that we're likely to see at least one score above 98 this year (last year there were 2), but not one that cracks 99 (to get above 99, you'd almost have to start buried in the field, be up front by a third of the way into the race, and never look back).

- Lowest Score of the Year?

Now this is something that I haven't donated a lot of time to figuring out. I won't lie, high scores are much more interesting to me than low scores. But, when I did my compiling of numbers from 2013 before the start of this season, I was able to learn a thing or two about the low numbers. With my new formula (although it's hardly new anymore), it is quite difficult to get a negative score. To get a big negative number a driver has to meet basically the inverse criteria of getting a big positive number. He/she must:

1) Start near the front of the grid.
2) Drop quickly down the running order.
3) Finish near the back.

But, one other factor must be considered. If a driver completes relatively few laps, his/her Race Score will be very close to zero. That's just the way my formula works. The more laps of the posted race distance that a driver completes, the farther it is possible for his/her Race Score to be from zero. This means that to get a big negative number a driver must also:

4) Complete a lot of the laps in the race.

Last year, the lowest Race Score computed using this formula was -8.34, which was accomplished by Helio Castroneves at Race 2 in Houston. So, I'm going to go out on a limb here. I think we've seen the lowest score that we'll see all year. Marco Andretti, and your -7.40 from St. Pete, I think the "honor" is yours...

Will 45.00 as an Indicator of "Good Score" Move?

To quickly catch everyone up, with this formula, 45.00 is considered a "good" score. Now, this is based almost completely on me doing the "Eyeball Test." I suppose I could send out questionnaires to each IndyCar team after each race and see how good they felt about their drive on a numerical scale, then correlate that data with Race Scores. But, I imagine they would ignore me. It would be interesting to try, though...

Anyway, back in the real world, I think that 45.00 as an indicator of "good" is unlikely to move. My lack of formal mathematical expertise is about to show, but the way I see it, there are only so many points that can be given out over the course of a race. Let me see if an example illustrates this better.

At St. Pete, Graham Rahal's line looked like this:

FinishDriverGridLedCompletedA.R.P.Race Score
14Rahal21011015.8935.01

Josef Newgarden's looked like this:

FinishDriverGridLedCompletedA.R.P.Race Score
9Newgarden22011013.1055.91

Now, if Graham had started in P22 instead of P21, there's no reason to think that anything else necessarily would have changed for him, he could have still passed a zillion cars on lap 1. He could still finish in the same position, with the same average running position. If he had, his line would look like this:

FinishDriverGridLedCompletedA.R.P.Race Score
14Rahal22011015.8936.53

As you see, that is an increase of 1.52 points. Now, if Graham had started P22, that means Josef would have started P21, and again, there's no reason to think that his race would have played out any differently if he had. Then his line would look like this:

FinishDriverGridLedCompletedA.R.P.Race Score
9Newgarden21011013.1054.39

As you can see, his Race Score would decrease by 1.52 points. There's a conservation of points that goes on whenever something changes on track. This leads me to believe that, with this formula, 45.00 is going to stay pretty rock-solid as an indicator of a good race. The numbers themselves don't seem to say anything about the quality of the field. Rather, they say how a driver performs relative to the field.


The Chase for my Heart

That's enough numerical nerd-business for now. Now, for something (as Monty Python would say) completely different. I no longer have a favorite driver in the Verizon IndyCar Series. Most of you know from previous posts of mine, that I was (and remain) a huge fan of Dario Franchitti. He is no longer racing, and I, therefore, no longer have a favorite driver.

I like almost every driver in the series, but none stand out as a true "rooting interest." And, I miss that. So, I need someone to cheer for until my love for another driver blossoms (like my love for Dario did) in an organic fashion.

So, I figured if I was going to artificially manufacture a favorite driver I should look to the monarchs, who rule over the land of artificially manufacturing drama: the National Association for Stock Car Auto Racing. (Was that a cheap shot? Yes. Do I think NASCAR would care? No.) And, therefore I am happy to announce the Chase for My Heart.

I have selected 16 drivers, who now all have a chance to be my new favorite driver. I will eliminate 4 drivers after each of the Grand Prix of Long Beach, the Grand Prix of Alabama, and the Grand Prix of Indianapolis. This will leave me with 4 contenders left when qualifying weekend for the Indy 500 rolls around. My new favorite driver: the winner of the Chase for My Heart, will be the one of those 4, who qualifies fastest for the 500.

So, now let's meet our contestants:

First, I decided that the Top 10 finishers from St. Pete had earned automatic berths in the Chase. They are:

- Will Power
- Ryan Hunter-Reay
- Helio Castroneves
- Scott Dixon
- Simon Pagenaud
- Tony Kanaan
- Takuma Sato
- Justin Wilson
- Josef Newgarden
- And, Ryan Briscoe

After those ten were locked into place, I gave out six Wild Card entries, based on my general positive feelings about a driver's personality, driving style, talent, or Italian-ness of name. That gives us these six:

- Sebastien Bourdais
- Graham Rahal
- Carlos Munoz
- James Hinchcliffe
- Charlie Kimball
- And, finally (sneaking in based on Italian-ness of name) Marco Andretti

So, these 16 drivers will have a little something extra on the line at Long Beach, because the Top 12 finishers out of this group will move on to the next round of the Chase for My Heart. Finishing position won't be the be-all-end-all in the Chase, but for this first race, I need to make sure that my new favorite driver can put up results.


Stay Tuned

Well, that's all the trouble I'm going to cause this week. Hopefully some of my numerical nerdiness was enjoyable, and hopefully you'll join me in my quest for a new favorite driver. Be sure to follow @ScoringIndy on Twitter for blog updates and race predictions.

I'll see you next week with Long Beach previews!

-- Guido

No comments:

Post a Comment