|
|
FINDING THE FLAWS Now for the fun! Remember how narrow the final BCS margin was between Nebraska, Colorado, and Oregon in 2001? Five-hundredths of a point separated teams 2 and 3! With that in mind, we should never lose sight of this fundamental principle: Once we decide to use a certain component in the process, we need to make sure we put in place procedures that will provide us with the most accurate numerical representation possible of the available data. You likely think this last remark goes without saying. Well, you would be amazed at how often this basic tenet has not been followed! Strange representations. As far as I understand, ANY predefined procedure does not guarantee us against real equality of forces. On one of Winter Olympiads the difference between the winner and the second participant in ëûæåîé to race on 30 kms was the one 100-th seconds. THE "HUMAN POLLING" COMPONENT: Picture the following scenario: You are entered in a dart-throwing contest. Each of the contestants throws 2 darts, a yellow and then a blue dart, at a tiny target on a dartboard. The two people whose total distance from the target is the lowest win 10 million dollars! You and the other contestants make your two throws, and you're feeling awfully good about your chances. Now, imagine your consternation when you see the judges go up to the board and start yanking out all the yellow darts. Some they reposition closer to the target, some further from the target! All sorts of questions would jump to mind: "What in the world are you doing?! Why are you moving some darts closer to the target yet others further from the target?! And why are you moving one dart but not the other?!" How do the actions of the judges in this example strike you? As mysterious? Certainly! As unwarranted? Of course! As too ridiculous to actually occur in a real-life setting? Sorry, but you're wrong if you answered in the affirmative this time. For this seemingly farfetched example is precisely analogous to what is going on right now in the human polling component of the BCS! It may not be as obvious to you when you look at the BCS system as it is in my example, but it is occurring nonetheless! Let's see if we can't get to the bottom of this. At times I may find it desirable to use some numerical data. I can assure you that any computation made will involve little more than simple arithmetic. If I feel that you may wish for additional information or explanation, I will direct you to a supplementary page. Now hang with me! It's going to be painless (and hopefully enjoyable). Suppose we have one group of 100 human pollsters, who are directed to vote for the top 25 football teams. The top team is rated #1, the 2nd team as #2, and so forth. (I have chosen 100 voters solely to make the computations more immediately discernible.) Then the following is a basic mathematical principle: if we wish to assign a single number to a team which best reflects the overall sentiment of the voting body, we use the simple arithmetic average! Here that would mean that for each team we would total the ratings it received from the voters and divide by 100. Let me give you an easy example. Suppose that 60 of the voters place team A at #2 while 40 voters see them as #3. What would Team A's average rating be? Well, the computation is quite simple: 60 votes at #2 + 40 votes at #3 yields 120 +120, or 240, total points. Divide this now by 100 (the number of voters) to get 2.40. Now I ask you, doesn't this rating strike you as being the correct numerical assignment for team A? 2.40 is between 2 and 3 (where all the votes landed), yet is a bit closer to 2 than to 3, which is as it should be, since Team A received more votes at #2 than at #3. In using the true (or "exact" or "actual") mathematical average above, we not only arrive at the most accurate numerical assessment available for each team, but we capture as well the exact numerical difference between each pair of teams, as gauged by the entire voting body. The example I used above was deliberately simplified to make a point. Although most teams will receive more varied sets of rankings, in no way is the conclusion any different. Namely, the most accurate numerical rating that we can assign to each team for its human component is its simple arithmetic average! Now, what is actually being used as ratings for the human polling component? Not the true averages, but rather what I shall refer to as the "skewed" average. Whichever team had the best (lowest) actual average has its average reassigned as 1.00. Whichever team is 2nd best has its actual mark repositioned as 2.00 and so on down the line. (We're moving those yellow darts!) Some teams will have their true marks skewed to a better position while others will see their position worsened. Now if the human poll were the only component that we were to consider, then it would make sense to list these ratings as 1, 2, and 3, because they would represent the final process. But it is completely unwarranted, unnecessary, and unfair to convert the true averages to skewed marks when they represent just one piece of the puzzle! An actual example may do wonders to bring this point across. Below I have printed the AP rankings from the week of November 13, 2001. Now the same conclusions could be drawn from looking at any other week, but the evidence from this week is particularly compelling. Since so many fans appear to be incapable of looking at data objectively if their team is involved, I have listed the teams as simply A, B, C, ... You may easily research which teams were actually involved, though. Besides listing the teams, the table contains 3 additional columns which list the total points received in the AP poll, the true average for each team (rounded to the nearest hundredth - note: AP poll employs 72 voters), and the skewed average. It's a bit unfortunate that both the AP and USA polls award 25 points for the top spot and proceed downward from there, with 1 point awarded for #25. But it is a simple matter to convert from the total points listed in the AP poll to the team's true average. (If you wish to see this explained, click on rating conversion) So, here is a partial listing (top 11 teams only) from that particular poll:
Take a moment to analyze this data. In particular, look at how each exact human average, which is our most accurate gauge of the overall sentiments of the voters, is invariably moved to arrive at the skewed value. Sometimes this movement is a little, sometimes it is a lot. Some teams are helped by this repositioning, while others are hurt. (Leave those yellow darts alone!) Look at the top 4 teams, for instance. Teams A and B have virtually split all the first and second place votes, with A coming out slightly ahead. If we use these true averages, we reflect that assessment!! Note, too, that teams C and D are in a virtual dead heat, with only one total vote separating them. Meanwhile, note the significant gap between teams B and C. On a per voter basis, team B was actually judged to be more than 2 spots ahead of team C. Again, were we to use the actual arithmetic averages, the differences between teams B, C, and D would be accurately reflected. Yet what will the skewed marks result in? Why, the numerical gap between teams B and C will be viewed as being identical to the gap between teams C and D! What an incredible misrepresentation of the data! And what an utterly needless misrepresentation! An even more outrageous situation occurs among teams G, H, and I. Again, the skewed marks demand that we place them each one unit apart, when in actuality, team G was viewed as being almost 3 spots better than team H, while H and I were virtually inseparable. I hope that it is clear to you that the error caused by using this faulty procedure can easily make the difference in who plays in the championship game. There is only one situation in which the true averages will precisely coincide with the skewed averages. This is when every single human pollster votes exactly the same way! That is, when there is total unanimity among all voters! Let me rephrase what I just said, because it's a very important point: Choosing to use the skewed human averages rather than the exact averages is equivalent to saying that each and every human pollster voted the same way!! And folks, that just isn't the case! To treat the two human polls as if they each represent one monolithic voter is a gross misrepresentation of the facts! I had intended to lay on you an almost equally compelling second reason why using the true arithmetic average for the human polling component is superior to the current skewed numbers being used. But this presentation is running long, so I will instead insert it into a supplement, which you can read if you feel the need for even more rationale. Click here for more rationale. But let me add here one interesting point. When the BCS committee first decided it wanted to include a computer component, what procedure did it use to arrive at the computer average? Why, the simple arithmetic average, of course! (The computer average played the role of the blue darts in the dart-throwing contest.) Gee, fellows (I'm addressing the BCS people now), if the true arithmetic average strikes you as being the best approach to use for the computers, why in the dickens isn't it the best approach for the humans?! Hopefully it is clear to you by now that if we wish to follow our basic axiom of using procedures which allow us to assess a component as accurately and as fairly as possible, we need to be working with the exact arithmetic average instead of the skewed values for the human component. All we need to do is to calculate the exact average for the AP writers' poll and the exact average for the USA-ESPN coaches' poll. Then average these two marks (that is, add them and divide by two). In case you are wondering how, had this correction been in place from the outset, the final placement of teams might have been different this year, let me just say that, of course, the final numbers would be altered. And I'm not intending to be coy in refusing to divulge more than that at this time. You are welcome to recalculate the human component for yourself (you will need to go through at least the top 15 spots, however, since these positions are critical when examining the new "quality points" feature). My concern is that you might base your decision on whether this recommended change is merited solely on the basis of whether 2 or 3 teams flip-flopped positions. In reality, the argument as to which procedure to follow is strictly "no contest"! The current skewed average is an inaccurate portrayal of the available data; the true average is an accurate portrayal of the data. By the way, using the skewed averages rather than the true averages did make a difference in the final BCS ratings of 1998. Kansas State had a final rating of 9.96, while Ohio State finished at 10.37. And yet nearly all human pollsters had OSU ranked not just ahead of KSU, but generally 2 positions ahead. The actual difference in the true averages was about 1.80. Yet in the skewed system used, with OSU just one above KSU in the pecking order, this difference was reduced to 1.00. This gain (from being 1.80 behind to just 1.00 behind) made the difference in the final positions of these two teams. However, hardly anyone seemed to care about or even notice this, all because their final positions were 3 and 4 rather than 2 and 3! It would have been nice if the committee had taken a pro-active approach and examined this situation carefully at the time. Typically, however, the committee has merely been re-active, attempting to fix a problem after something has gone wrong. A final note about this portion of the process before we move on: Perhaps in the back of your mind you are saying, "Doesn't there have to be some logical reason why they are doing what they are?" As you will find elsewhere as well, the simple answer is, "No, there doesn't." I have presented this information to members of the committee and have sought their rationale, but my inquiries have simply been ignored. Does the use of the skewed averages somehow impart more value to the human component than would the use of the true averages? No, for so long as the "human average" component remains one-fourth of the equation, then whether we use my suggested actual average or the BCS's current skewed average, the weight (or importance) that this factor carries will be the same in either case. No, I believe that both here and in later components you will likely discover that the reason that a certain procedure is in place is not because the committee has found a valid reason to justify it, but rather because they simply have not been aware of a superior alternative (no matter how transparent or obvious such an alternative may seem to you or me!). In the case of the human component, I think it was a matter of "out of sight, out of mind". When the human polls are printed in the papers, they don't have a column listing the true average rating of each team (even though this is easy to do). Hence, it apparently never occurred to our committee that such data existed! THE "COMPUTER POLLING" COMPONENT: Currently the BCS uses the input of 8 computers. [Editor's note: As of 2002 just 7 computers.] Each individual who has devised a computer program to rate teams is directed to use it to generate a ranking for every division 1A team. The top team is assigned a rating of 1, the 2nd best team is assigned 2, and so forth. In other words, each computer generates ratings just as does a human pollster. For now, don't concern yourself with the internal workings of the individual computers. I will have some comments about them later. Just know that after each of the 8 computers turns in its ballot, then for each team the high and the low marks are dropped, and with the remaining six computer ratings the exact arithmetic average is calculated. This simple procedure results in the "computer polling" rating for each team. Hey, guess what? I really don't have any strong objections to this procedure! We could debate back and forth whether it's necessary to drop the high and the low scores. If you have concerns about a "rogue computer" either adversely helping or hurting a particular team, then you likely favor dropping a high and low mark. (By the way, we could choose to drop high and low scores from the human ratings as well.) If you are of the opinion that if we believe in the credibility of the computers, we ought to use them all, then you would urge keeping all the scores. But again, this is not an issue that one can definitively say should be done one way or the other. It's simply a matter of making a value judgment. Had all 8 computers been averaged rather than just the middle 6 this year, and had nothing else been modified in the system, Colorado would have finished ahead of Nebraska. Again, that fact doesn't make one approach right and the other wrong. It does indicate that we need to analyze our processes thoughtfully, however. I remarked earlier that it appears that the BCS committee, or perhaps more accurately those individuals whom they employ to help them formulate the overall process, quite often seem to be simply unaware of some of the unfair and illogical twists that they set into motion with some of their decisions. None was more outrageous than a mechanism they put into play the first year of the BCS, when only 3 computers were employed. If you are not aware of the monumental gaffe to which I refer, nor of how close these people came to having this whole system crash down upon them, let me suggest that you read my brief (OK, so it doesn't appear that I can do anything briefly) summary of the infamous "computer safeguard" feature. Click here for more about the "computer safeguard." Of all the errors that have existed in the system, this was likely the most incredible. It literally mandated unfairness! Please check out the story if it's new to you, but for now, let's move on. |
|
|
|