A modification of OPS: Widely used to measure a baseball batter`s
Transcription
A modification of OPS: Widely used to measure a baseball batter`s
Chulmin Kim, Ph.D. Assistant Professor of Statistics School of Mathematical Sciences Rochester Institute of Technology Intro. Example Studies • Fields of Statistics • Intro. to Sports Statistics • Study 1: Bracket Analysis (NCAA Men’s Basketball tournament) • Study 2: A modification of OPS –WOA (Major League Baseball) • Study 3: A Study of BCS Rating (NCAA Div. I FBS Football) • Conclusion & Discussion Discussion UP-STAT 2013 @ Rochester Institute of Technology 2 Actuarial science (Statistics + Insurance industries) Bioinformatics (Statistics + Computer Science + Biology) Biostatistics (Statistics + Biological or Medical Sciences) Business analytics (Statistics + Business) Chemometrics (Statistics + Chemistry) Econometrics (Statistics + Economics) Environmental statistics (Statistics + Environm. Science) Geostatistics (Statistics + Geology) Operations research (Statistics + Optimal to Complex) Psychometrics (Statistics +Psychology) Quality control (Statistics +Industrial Engineering) Sports Statistics (Statistics + Sports) UP-STAT 2013 @ Rochester Institute of Technology 3 The Section on Statistics in Sports (SIS) was founded during the 1992 Joint Statistical Meetings, filling a need to foster the development of statistics and its applications in sports. The mission of SIS has been to stimulate statistical research with an application to sports, promoting publications devoted to statistical theory and methodology and their application to statistics in sports, and to increase the availability of information concerning the science of statistics and its contribution to sports. UP-STAT 2013 @ Rochester Institute of Technology 4 Tournament Selection Efficiency: An Analysis of the PGA TOUR's FedExCup Predicting the Maximum Lead from Final Scores in Basketball: A Diffusion Model An Exploratory Study of Minor League Baseball Statistics Pace and Critical Gradient for Hill Runners: An Analysis of Race Records Models for Third Down Conversion in the National Football League A Comparison of the Autocorrelation and Variance of NFL Team Strengths Over Time using a Bayesian State-Space Model The Sensitivity of College Football Rankings to Several Modeling Choices The Individual Factors of Successful Free Throw Shooting New Insights on the Tendency of NCAA Basketball Officials to Even Out Foul Calls Estimating Fielding Ability in Baseball Players Over Time UP-STAT 2013 @ Rochester Institute of Technology 5 Establishing the Run (in American Football) -Conventional Wisdom or Obsolete Adage? NHL Team Statistics by Nonparametric Statistical Analysis Plus-or-Minus for NHL players ROCHESTER INSTITUTE OF TECHNOLOGY Men’s Basketball team Paul Pierce Analysis of Fernando Verdasco’s Tennis Career Did Inexperienced Referees Impact Football Statistics? UP-STAT 2013 @ Rochester Institute of Technology 6 Bracket Analysis of the 2013 NCAA Men’s basketball-Does the RPI work well? UP-STAT 2013 @ Rochester Institute of Technology 7 There are 347 schools in 32 Division I basketball conferences. (AVG 10.8 teams/conference Largest: (16) Atlantic 10 Smallest: (8) Patriot, Ivy League) Each conference (except for the Great West) receives an automatic bid to the NCAA Men's Division I Basketball Championship. UP-STAT 2013 @ Rochester Institute of Technology 8 Nickname for the NCAA (Men’s) Basketball Championship # of Teams in the field of NCAA (68) Single-elimination tournament to determine the national Championship Automatic bids At-large berths The field of 68 college basketball teams Automatic bids (31): The champions from 31 conferences At-large berths (37): Selected by the selection committee UP-STAT 2013 @ Rochester Institute of Technology 37 31 9 Conference 12 15 10 Teams UNC, Duke, FSU, Virginia, BC, NC State, Miami, WF, VT, GT, Clemson, Maryland (2014), Syracuse (2013), Pitts (2013), Notre Dame* (2013), Louisville (2014) Syracuse, Marquette, Notre Dame*, Cincinnati, Georgetown, USF, Louisville, Connecticut, Seton Hall, Rutgers, St. John’s, Pittsburgh, Villanova, Providence, De Paul Kansas, Baylor, Iowa State, Kansas State, Texas, Oklahoma, Oklahoma State, Texas Tech, TCU, West Virginia Conference 12 12 14 UP-STAT 2013 @ Rochester Institute of Technology Teams Ohio State, Michigan State, Michigan, Wisconsin, Indiana, Purdue, Northwestern, Iowa, Minnesota, Illinois, Nebraska, Penn State, Rutgers (2014), Maryland (2014) Washington, Oregon, California, Arizona, Colorado, UCLA, Stanford, Oregon State, Washington State, Arizona State, Utah, USC Kentucky, Florida, Vanderbilt, Alabama, Mississippi State, Ole Miss, Arkansas, Auburn, Tennessee, Louisiana State, Georgia, South Carolina, Missouri, Texas A&M 10 From 16 teams in 2012, West Virginia (2012Big 12) Pittsburgh, Syracuse (2013ACC) Notre Dame* (2013ACC) New Big East (10 schools, in 2013) The Catholic 7 (Villanova, Providence, De Paul, Seton Hall, Marquette, Georgetown, St. John’s) NEW 3: Xavier (A10), Creighton (MVC), Butler (A10) Renamed Big East (10 schools, in 2013 & 2014, 12 teams in 2015) (America Athletics Conference? On April 4th) Cincinnati, Connecticut, USF Louisville (2014ACC), Rutgers (2014Big Ten) Houston, Memphis, SMU, Temple, UCF (2013) East Carolina, Tulane (2014), Tulsa (2015), Navy* (2015) UP-STAT 2013 @ Rochester Institute of Technology 11 # of Teams in the field of NCAA (68) Automatic bids Six Power conference Twenty six non-power conference 11 26 31 Conf. At-large/ poss. tms in Conf. Winning Ratio in the Field Notables Big East 7/14 (50%) 65% (11-6) Louisville, Syracuse Big 12 4/9 (44%) 38% (3-5) Kansas, Iowa St. Big Ten 6/11 (55%) 68% (13-6) Michigan, Ohio St. ACC 3/11 (27%) 60% (6-4) Duke, Florida St. SEC 2/13 (15%) 57% (4-3) Florida, Mississippi Pac 12 4/11 (36%) 50% (5-5) Arizona, Oregon MWC 4/8 (50%) 29% (2-5) SD St., Colorado St. A 10 4/15 (27%) 58% (7-5) La Salle, St. Louis WCC 1/8 (13%) 50% (2-2) Gonzaga, St. Mary’s Others 2/224 (1%) 32% (11-23) Wichita St., Middle Tenn UP-STAT 2013 @ Rochester Institute of Technology 12 The process of predicting the field of the NCAA Basketball Tournament. It incorporates some method of predicting what the Selection Committee will use as its Ratings Percentage Index in order to decide 37 at-large teams to complete the field. It seeds the field by ranking all teams from 1st through 68th. UP-STAT 2013 @ Rochester Institute of Technology 13 The inventor of the term "bracketology", starting first as the editor of the Blue Ribbon College Basketball Yearbook and ending up as the resident bracketologist on ESPN. Teaches an online course at Saint Joseph's titled "Fundamentals of Bracketology." However ranked only 36th out of 65 bracket experts in the past four years' results of the tournament. UP-STAT 2013 @ Rochester Institute of Technology 14 A quantity used to rank sports teams based upon a team's wins and losses and its strength of schedule. This system has been in use in college basketball since 1981 to aid in the selecting and seeding of teams appearing in the 68-team men's playoffs. The RPI rating is often considered a significant factor in selecting and seeding the final few teams in the tournament field, though the selection committee stresses that the RPI is used merely as a guideline and not as an infallible indicator of a team's worth. In its current formulation, the index comprises a team's winning percentage (25%), its opponents' winning percentage (50%), and the winning percentage of those opponents' opponents (25%). UP-STAT 2013 @ Rochester Institute of Technology 15 The opponents' winning percentage and the winning percentage of those opponents' opponents both comprise the strength of schedule (SOS). Thus, the SOS accounts for 75% of the RPI calculation and is 2/3 its opponents' winning percentage and 1/3 times its opponents' opponents' winning percentage. The RPI lacks theoretical justification from a statistical standpoint. Other ranking systems which include the margin of victory of games played or other statistics in addition to the win/loss results have been shown to be a better predictor of the outcomes of future games. However, because the margin of victory has been manipulated in the past by teams or individuals in the context of gambling, the RPI can be used to mitigate motivation for such manipulation. UP-STAT 2013 @ Rochester Institute of Technology 16 Part I (25% of the formula): Team winning percentage. For the 2005 season, the NCAA added a bonus/penalty system, where each home win or road loss get multiplied by 0.6 in the winning percentage calculation. A home loss or road win is multiplied by 1.4. Neutral games count as 1.0. Part II (50%): Average opponents’ winning percentage. To calculate this, you must calculate each opponent’s winning percentage individually and average those figures. Games involving the team for whom we are calculating the RPI are ignored. Part III (25%): Average opponents’ opponents’ winning percentage: Basically taking all of the opponents’ Part II values and averaging them. [Source: kenpom.com] UP-STAT 2013 @ Rochester Institute of Technology 17 Home Score Away Score Memphis 78 North Carolina 72 Memphis 82 Maryland 68 Wake Forest 71 Memphis 72 North Carolina 93 Memphis 68 Maryland 81 Wake Forest 72 Wake Forest 52 North Carolina 68 R Team W-L W% (H) (A) 1 Memphis 3-1 .750 2-0 1-1 2 North Carolina 2-1 .667 1-0 1-1 3 Maryland 1-1 .500 1-0 0-1 4 Wake Forest 0-3 .000 0-2 0-1 UP-STAT 2013 @ Rochester Institute of Technology 18 Home Score Away Score H Memphis 78 North Carolina 72 Mem 0.6/0.6 UNC 0/0.6 Memphis 82 Maryland 68 Mem 0.6/0.6 Mary 0/0.6 Wake Forest 71 Memphis 72 WF North Carolina 93 Memphis 68 UNC 0.6/0.6 Mem 0/0.6 Maryland 81 Wake Forest 72 Mary 0.6/0.6 WF Wake Forest 52 North Carolina 68 WF H(%) 0/1.4 0/1.4 A A(%) Mem 1.4/1.4 0/0.6 UNC 1.4/1.4 WP Memphis: (0.6 + 0.6 + 1.4 + 0) / (0.6 + 0.6 + 1.4 + 0.6) = 0.8125 North Carolina: (0 + 0.6 + 1.4) / (0.6 + 0.6 + 1.4) = 0.7692 Maryland: (0 + 0.6) / (0.6 + 0.6) = 0.5000 Wake Forest: (0 + 0 + 0) / (1.4 + 0.6 + 1.4) = 0.0000 UP-STAT 2013 @ Rochester Institute of Technology 19 Team Oponent 1 Oponent 2 Oponent 3 Oponent 4 OWP UNC 1.0 (1-0) 0.7500 Mem UNC 1.0 (1-0) Mary 1.0 (1-0) WF 0.0 (0-2) UNC Mem 1.0 (2-0) Mem 1.0 (2-0) WF 0.0 (0-2) Mary Mem 0.6667 (2-1) WF 0.0 (0-2) WF Mem 0.6667 (2-1) Mary 0.0 (0-1) Team Team 0.6667 0.3333 UNC 0.5 (1-1) 0.3889 Oponent 1 Oponent 2 Oponent 3 Oponent 4 OOWP Mem UNC 0.6667 Mary 0.3333 WF 0.3889 UNC 0.6667 0.5139 UNC Mem 0.7500 Mem 0.7500 WF 0.3889 Mary Mem 0.7500 WF 0.3889 WF Mem 0.7500 Mary 0.3333 0.6296 0.5694 UNC 0.6667 0.5833 WP (25%) OWP (50%) OOWP (25%) SOS RPI W% Mem 0.8125 0.7500 0.5139 0.6713 0.7066 0.7500 (3-1) UNC 0.7692 0.6667 0.6296 0.6543 0.6830 0.6667 (2-1) Mary 0.5000 0.3333 0.5694 0.4120 0.4340 0.5000 (1-1) WF 0.0000 0.3889 0.5833Institute0.4537 0.3403 UP-STAT 2013 @ Rochester of Technology 0.0000 (0-3) 20 Conf Mountain West Big Ten Big East ACC Big 12 Pac 12 A-10 SEC Missouri Valley West Coast RPI SOS N 0.580 0.570 9 0.577 0.573 12 0.574 0.567 15 0.561 0.557 12 0.558 0.558 10 0.555 0.555 12 0.548 0.547 16 0.542 0.541 14 0.534 0.533 10 0.529 0.527 9 Some argue that the heavy emphasis on SOS gives an unfair advantage to teams from power conferences. Teams from "majors" are allowed to pick many of their non-conference opponents (Power: 1/2.7 teams, MWC,A10,WCC: 1/3.4, Other:1/112.) Some mid-major compel their teams to schedule opponents ranked in the top half of the RPI, which could boost the strength of that conference and/or its tougher-scheduling teams. UP-STAT 2013 @ Rochester Institute of Technology 21 ESPN created the BPI. Other power rankings already exist. (the RPI, Sagarin’s Ratings, and Ken Pomeroy's ratings, etc.) All of these methods are based upon the outcomes of games, their location (home/road/neutral) and the quality of opponents. Each one basically puts these together in slightly different ways and arrives at slightly different results. UP-STAT 2013 @ Rochester Institute of Technology 22 RPI, due to its simplicity, tends to be the biggest decision aid for the tournament committee, even though it doesn't account for the actual scores of games. BPI added a way of accounting for missing players. If a team or its opponent is missing one of its most important players for a contest, that game is less important for ranking the teams compared to games in which both teams are at full strength. UP-STAT 2013 @ Rochester Institute of Technology 23 Another way that BPI can rank teams differently than Sagarin or Kenpom is counting close games at home versus on the road. In BPI, a close win at home is better than a close loss on the road against the same opponent. This isn't necessarily true in other methods. Other methods don't typically account for bigger wins. BPI gives marginally decreasing credit for bigger wins, with a 30-point win being only about 20 percent better than a 15-point win, not twice as good, which can happen in other methods. UP-STAT 2013 @ Rochester Institute of Technology 24 Includes Scoring margin Diminishing returns for blowouts Pace of game matters Home/Neutral/Road SOS beyond Opponent's opponents' W-L All wins are better than losses De-weighting games with missing key players RPI Sagarin’s BPI X O O X O O X X O O O O X O O O X O X X O Between the 2007 and 2011 NCAA tournaments, it picked 74.4 percent of the matchups correctly, whereas Sagarin’s picked 73.2 percent and RPI picked 71.9 percent. The BPI is not a guaranteed way to pick a perfect bracket, but we do think it is the best power ranking available. [Source: ESPN.com] UP-STAT 2013 @ Rochester Institute of Technology 25 Which works better to predict the 2012 & 2013 NCAA bracket? RPI or BPI? Or Sagarin’s? To decide it, we evaluate six things: Correctly Picked Correctly Seeded Correctly Seeded Within 1 Correctly Seeded Within 2 + 5 pts + 3 pts + 2 pts + 1 pt Missed but in top 41-51 Missed but in top 40 - 1 pt - 2 pts Note: This analysis is nothing to do with who is the best at picking winners in the NCAA tournament. This is all about predicting the participants in the field of the tournament and how well seeded them. UP-STAT 2013 @ Rochester Institute of Technology 26 RPI predicts 91.9% (34/37) correctly. (Missed Virginia, Iona, West Virginia) BPI predicts 86.5% (32/37) correctly. (Missed Iona, Xavier, Notre Dame, South Florida, Colorado State) Sagarin’s predicts 86.5% (32/37) correctly. (Xavier, Iona, Southern Mississippi, South Florida, Colorado State) UP-STAT 2013 @ Rochester Institute of Technology 27 Seed Bs Rs Ss Auto Team Bd Rd Sd 1 1 1 1 no Kentucky (32-2) 0 0 0 1 1 1 2 no Syracuse (31-2) 0 0 1 1 2 1 2 no North Carolina (29-5) 1 0 1 1 1 1 1 Yes Michigan St (27-7) 0 0 0 2 2 2 1 no Kansas (27-6) 0 0 1 2 2 2 3 no Duke (27-6) 0 0 1 2 1 2 1 no Ohio State (27-7) 1 0 1 2 2 3 2 Yes Missouri (30-4) 0 1 0 3 3 2 3 no Baylor (26-7) 0 1 0 3 5 3 4 no Marquette (25-7) 2 0 1 3 7 3 5 Yes Florida St (24-9) 4 0 2 3 5 4 4 no Georgetown (22-8) 2 1 1 4 8 3 6 no Michigan (23-9) 4 1 2 4 4 6 2 no Wisconsin (24-9) 0 2 2 4 4 5 3 no Indiana (25-8) 0 1 1 UP-STAT 2013 @ Rochester Institute of Technology 28 BPI #37 (ArizonaNIT#1[32]) #41 (MiamiNIT#2[16]) #48 (La SalleNIT#3[32]) #50 (Seton HallNIT#1[16]) #51 (NorthwesternNIT#4[16]) RPI #42 (Oral RobertsNIT#4[32]) #44 (MarshallNIT#5[32]) #51 (Middle TennesseeNIT#4[8]) Sagarin’s #45 (MiamiNIT#2[16]) #47 (ArizonaNIT#1[32]) #48 (Stanford NIT#3[Champion]) #49 (Middle Tennessee NIT#4[8]) #51 (Minnesota NIT#6[Runner-up) UP-STAT 2013 @ Rochester Institute of Technology 29 Iona (25-8, MAACLost in First Four) [#52, #54, #58] Xavier (21-12, A10Sweet 16) [#53, #43, #57] Colorado St. (20-11, MWCLost 1st Rd.) [#70, #30, #74] USF (19-13, Big EastLost 2nd Rd.) [#64, #48, #69] Notre Dame (22-11, Big EastLost 1st Rd.) [#60, #37, #42] Virginia (22-9, ACCLost 1st Rd.) [#27, #53, #25] W. Virginia (19-13, Big EastLost 1st Rd.) [#42, #67, #39] S. Mississippi (23-8, C-USALost 1st Rd.) [#46, #22, #60] Note that: The ranks are BPI, RPI, Sagarin’s respectively. BPI and Sagarin’s mostly accord but not RPI. UP-STAT 2013 @ Rochester Institute of Technology 30 Method RPI BPI SAG Correctly Picked Correctly Seeded + 5 pts 65 63 63 Correctly Seeded Within 1 Missed but in top 41-51 Missed but in top 40 + 3 pts + 2 pts + 1 pt - 1 pt - 2 pts 24 22 13 3 4 5 0 1 0 24 17 27 Correctly Seeded Within 2 6 7 9 UP-STAT 2013 @ Rochester Institute of Technology 31 The winner based on our method is: RPI: 448 pts. BPI: 416 pts. Sagarin’s: 412 pts. UP-STAT 2013 @ Rochester Institute of Technology 32 Champion: Kentucky [#1 seed, #2, #1, #1] Runner-up: Kansas [#2 seed, #6, #6, #4] Final Four: Ohio St. [#2 seed, #7, #3, #2] Final Four: Louisville [#4 seed, #13, #11, #18] Elite Eight: Syracuse [#1 seed, #1, #2, #6] Elite Eight: UNC [#1 seed, #3, #5, #5] Elite Eight: Baylor [#3 seed, #8, #12, #12] Elite Eight: Florida [#7 seed, #29, #16, #17] Elite Eights are ALL from Power Conferences. Winning rate of higher seed = 73% (46/63) Women’s Final Four are ALL #1 Seeds!!! UP-STAT 2013 @ Rochester Institute of Technology 33 RPI predicts 97.3% (36/37) correctly. (Missed only California) BPI predicts 86.5% (32/37) correctly. (Missed California, La Salle, Temple, Villanova, Illinois) Sagarin’s predicts 83.8% (31/37) correctly. (Colorado, Middle Tennessee, Temple, California, La Salle, Boise State) UP-STAT 2013 @ Rochester Institute of Technology 34 Seed Bs Rs Ss Auto 1 1 1 1 Yes 1 2 2 1 Yes 1 1 2 1 No 1 2 2 2 Yes 2 3 1 4 Yes 2 1 1 2 No 2 4 3 3 No 2 2 3 2 Yes 3 3 1 5 Yes 3 1 2 1 No 3 3 3 2 No 3 5 3 6 No 4 2 5 3 No 4 8 6 5 No 4 5 4 4 Yes School Record Bd Rd Sd Louisville 29–5 0 0 0 Kansas 29–5 1 1 0 Indiana 27–6 0 1 0 Gonzaga 31–2 1 1 1 Miami 27–6 1 1 2 Duke 27–5 1 1 0 Georgetown 25–6 2 1 1 Ohio State 26–7 0 1 0 New Mexico 29–5 0 2 2 Florida 26–7 2 1 2 Michigan State 25–8 0 0 1 Marquette 23–8 2 0 3 Michigan 26–7 2 1 1 Kansas State 27–7 4 2 1 Saint Louis 27–6 1 0 0 UP-STAT 2013 @ Rochester Institute of Technology 35 BPI #41 (Stanford NIT#4[16]) #43 (Iowa NIT#3[Runner-up]) #46 (Baylor NIT#2[Champion]) #48 (Virginia NIT#1[8]) #50 (Maryland NIT#2[4]) RPI #34 (Southern MississippiNIT#1[8]), Sagarin’s #29 (Iowa NIT#3[Runner-up]) #31 (Baylor NIT#2[Champion]) #37 (Kentucky NIT#1[32]) #38 (Virginia NIT#1[8]) #48 (Stanford NIT#4[16]) #49 (Maryland NIT#2[4]) UP-STAT 2013 @ Rochester Institute of Technology 36 California (20-11, Pac 12 Lost 2nd Rd.) [#53, #53, #56] Temple (23-9, A10Lost 2nd Rd.) [#56, #41, #55] La Salle (21-9, A10Sweet 16) [#54, #40, #57] Illinois (22-12, Big Ten Lost 2nd Rd.) [#63, #39, #44] Colorado (21-11, Pac 12Lost 1st Rd.) [#39, #37, #51] MTSU(28-5, Sun BeltLost 1st Rd.) [#45, #28, #53] Boise St. (21-10, MWCLost in First 4) [#44, #44, #58] Note that: The ranks are BPI, RPI, Sagarin’s respectively. BPI and Sagarin’s mostly accord but not RPI. UP-STAT 2013 @ Rochester Institute of Technology 37 Method RPI BPI SAG Correctly Picked Correctly Seeded + 5 pts 67 63 62 Correctly Seeded Within 1 Missed but in top 41-51 Missed but in top 40 + 3 pts + 2 pts + 1 pt - 1 pt - 2 pts 25 21 18 0 5 2 1 0 4 21 19 25 Correctly Seeded Within 2 12 14 9 UP-STAT 2013 @ Rochester Institute of Technology 38 The winner based on our method is: RPI: 462 pts. BPI: 425 pts. Sagarin’s: 413 pts. UP-STAT 2013 @ Rochester Institute of Technology 39 544 448 462 416 425 412 413 408 Year RPI BPI Sagarin’s 2012 448 416 412 2013 462 425 413 2012 272 2013 136 The selection committee heavily depends on RPI. The tendency is even becoming stronger! 0 RPI BPI Sagarin's UP-STAT 2013 @ Rochester Institute of Technology 40 Final Four: Louisville [#1 seed, #3, #1, #1] Final Four: Syracuse [#4 seed, #13, #11, #13] Final Four: Michigan [#4 seed, #17, #8, #11] Final Four: Wichita St. [#9 seed, #38, #24, #42] Elite Eight: Duke [#2 seed, #1, #3, #7] Elite Eight: Ohio St. [#2 seed, #11, #7, #5] Elite Eight: Florida [#3 seed, #8, #2, #3] Elite Eight: Marquette [#3 seed, #12, #20, #22] Elite Eights are ALL but Wichita State (MVC) from the Power Conferences. Winning rate of higher seed = 67% (40/60) UP-STAT 2013 @ Rochester Institute of Technology 41 1.000 1.000 Rd 1 Win (%) by seed since 1985 0.952 0.833 0.798 0.750 0.6790.702 0.607 0.548 0.500 0.452 0.393 0.2980.321 0.250 0.202 0.167 0.048 0.000 0.000 #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 UP-STAT 2013 @ Rochester Institute of Technology 42 Overall Win (%) by seed since 1985 1.000 0.785 0.750 0.500 0.703 0.639 0.582 0.5370.553 0.454 0.430 0.396 0.358 0.337 0.316 0.250 0.206 0.138 0.051 0.000 0.000 #1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16 UP-STAT 2013 @ Rochester Institute of Technology 43 How many brackets are possible? Since for the 64-team tournament, there will be 63 games. And each game has only two possible outcomes (Win/Lose). There are 263=9,223,372,036,854,775,808 possible brackets (9.2 quintillion) UP-STAT 2013 @ Rochester Institute of Technology 44 Sports writer R. J. Bell says about this number: If one bracket per second was filled out, it would take 292 billion years to fill out all possible brackets (that's 20 times longer than the universe has existed). If all the people on earth filled out one bracket per second, it would take over 43 years to fill out every possible bracket. If all possible brackets were stacked on top of each other (on standard paper), the pile would reach from the moon and back over 1.1 million times. UP-STAT 2013 @ Rochester Institute of Technology 45 How many brackets correctly picked up the current Final Four? Yahoo.com Out of 966,063, only 9 (<.001%, or one out of every 107,340) Louisville (29%), Michigan (1.9%), Syracuse (1.1%), Wichita St. (.017%) ESPN.com Out of 8.15mil., only 47 (<.001%, or one out of every 173,404) Wichita St. (.019%) UP-STAT 2013 @ Rochester Institute of Technology 46 In the modern era (1985-present with 64 team tournament), 2011 was the worst predictable year. 2013 is even worse! In 2011, [68.3%=43/63 higher seeds win %] (In 2012, 73.0%=46/63] VCU (#11 advanced to the Final Four) Butler (#8 advanced to the Championship) Elite 8 contained only one #1 (Kansas) Final 4 contained NO #1 (#3, #4, #8, #11) In 2013, [66.7% s=40/60 so far higher seeds win %] Florida Gulf Coast [1997-first class has been held] (First ever #15 advanced to the Sweet 16/Atlantic Sun: 24-10/ Head Coach Andy Enfield already hired by USC on Apr. 1) Wichita St. (#9 advanced to the Final Four) Elite 8 contained only one #1 (Louisville) Final 4 contained NO #1 (#1, #4, #4, #9) In the modern era, #1 has 91% ratio vs. #9 (60-6). No #4 vs. #4. UP-STAT 2013 @ Rochester Institute of Technology 47 Semifinal #1 #9. Wichita State (30-8, MVC) 6:09 pm Atlanta Semifinal #2 #1. Louisville (33-5, Big East) #4. Syracuse (30-9, Big East) 8:49 pm Atlanta #4. Michigan (30-7, Big Ten) W, #8 Pitts W, #1 Gonzaga W, #13 La Salle W, #2 Ohio St. W, #16 NC A&T W, #8 Colorado St. W, #12 Oregon W, #2 Duke W, #13 Montana W, #12 California W, #1 Indiana W, #3 Marquette W, #13 South Dakota St. W, #5 VCU W, #1 Kansas W, #3 Florida The 2013 NCAA Championship (9pm on Monday) UP-STAT 2013 @ Rochester Institute of Technology 48 WOA (Weighted Offensive Average), A modification of OPS: Widely used to measure a baseball batter’s performance? UP-STAT 2013 @ Rochester Institute of Technology 49 Batting average (BA), home runs (HRs), and runs batted in (RBIs) have been the most dominant statistics to measure a baseball batter's performance. Slugging percentage (SLG) and on-base percentage (OBP) have been used as alternatives of the traditional three statistics. SLG measures how often a batter hits and how valuable the hits are and OBP measures how often a batter reaches bases. Whereas SLG ignores reaching bases by hits by pitched ball or walks, OBP is limited to measure the quality of the hits. UP-STAT 2013 @ Rochester Institute of Technology 50 A combination of these two is called OPS, the sum of OBP and SLG, which has become more widely used. Kim (2013) introduced a variation of OPS, WOA (weighted offensive average), which is a single number explaining not only a batter's hitting performance but also his non-hitting performance to generate runs for his team such as stolen bases, walks, and etc. This newly developed statistic is based on major league team statistics from the year 2000 to the year 2008. UP-STAT 2013 @ Rochester Institute of Technology 51 Japanese league Major league Player Hits BA HR RBIs Hits BA HR RBIs Ichiro 1278 .353 118 529 2597 .322 104 656 Matsui 1390 .304 332 889 1253 .282 175 760 Salary Salary Hits BA HR RBIs OBP SLG OPS (2008) (2004) Ichiro $17 mil. $6.5 mil. 262 0.372 8 60 0.414 0.455 0.869 Matsui $13 mil. $7 mil. 174 0.298 31 103 0.390 0.522 0.912 Player Descriptive career statistics for Ichiro and Matsui in Japan and in major league. (top) Descriptive baseball statistics for Ichiro and Matsui in the major league in 2004 and salary in 2004 and 2008. Note: The player with bold is better in the category in 2004. (bottom) UP-STAT 2013 @ Rochester Institute of Technology 52 Table 5: The comparison of several batting statistics. The newly developed statistic WOA with bold is able to measure all of these categories and its scale is simila Statistics Accuracy Power Reaching Bases Running Q1 Q3 by non-hits X X .261 .297 BA O X SLG O O X X .406 .506 ISO X O X X .134 .226 pSLG X O X X .162 .267 OBP O X O X .328 .373 OPS O O O X .743 .874 GPA O O O X .253 .292 WOA O O O O .263 .298 The comparison of several batting statistics. The newly developed statistic WOA with bold is able to measure all of these categories and its scale is similar to BA. UP-STAT 2013 @ Rochester Institute of Technology 53 Since SLG and OBP are highly correlated (r=.75), there exists “multicollinearity”. Kutner (2004) says in his book: “The simple interpretation of regression coefficients is often unwarranted with highly correlated explanatory variables”. Since BA is a component of both OBP and SLG, we may want break OBP and SLG down into BA and some non-BA part. And hopefully they are not highly-correlated to avoid the effects of “multicollinearity”. Let us define reaching bases by non-hitting performance (nP) measures a batter’s ability to reach the bases by nonhits such as BB, HBP, or SB. It is obtained by 2.8*(B%CS%) + SB%, where B%=(BB+HBP)/PA, CS%=CS/PA and SB%=SB/PA. Here the weights 2.8 come from the linear weights in the regression equation of RPG on a team’s B%, CS%, and SB%. UP-STAT 2013 @ Rochester Institute of Technology 54 Since OBP≈BA+B% and SLG=BA+ISO, we may consider the regression of RPG on BA, ISO, and B%. Since pSLG gives a similar interpretation with ISO (r=.94) and nP gives a similar interpretation with B% (r=.92). pSLG and nP would replace by ISO and B%. Another advantage to use them as explanatory variables along with BA to explain RPG is because they have much smaller correlation with BA (r=.02 for pSLG and r=.10 for nP). The formula for WOA is given by WOA = (8*BA + 2*pSLG + nP)/10.5. Then the new regression is given by RPG = -7.12 + 44.8*WOA with R2=90.9%. UP-STAT 2013 @ Rochester Institute of Technology 55 RPG vs. pSLG nP ISO BA OBP SLG OPS GPA WOA TOT 0.491 0.503 0.728 0.786 0.882 0.895 0.946 0.951 0.953 AL 0.519 0.626 0.729 0.766 0.898 0.886 0.947 0.955 0.957 NL 0.545 0.522 0.763 0.767 0.878 0.903 0.952 0.956 0.956 2000 0.398 0.452 0.684 0.797 0.942 0.870 0.929 0.946 0.952 2001 0.552 0.564 0.762 0.843 0.926 0.879 0.934 0.947 0.955 2002 0.454 0.438 0.736 0.830 0.836 0.915 0.935 0.929 0.936 RPG vs. pSLG nP ISO BA OBP SLG OPS GPA WOA 2003 0.661 0.586 0.853 0.889 0.916 0.952 0.974 0.973 0.977 2004 0.582 0.501 0.781 0.803 0.875 0.928 0.970 0.970 0.970 2005 0.539 0.466 0.672 0.705 0.783 0.789 0.879 0.894 0.895 2006 0.359 0.337 0.634 0.671 0.799 0.854 0.934 0.933 0.934 2007 0.277 0.438 0.584 0.764 0.875 0.886 0.951 0.958 0.959 2008 0.479 0.505 0.698 0.677 0.837 0.905 0.945 0.943 0.949 The correlation comparison between RPG and pSLG, nP, ISO, BA, OBP, SLG, OPS, GPA, and WOA by year and by league. Bold represents the statistic with the highest correlation among them (left) The comparative box-plots of RPG and FRPG (fitted RPG on WOA) by year (right) UP-STAT 2013 @ Rochester Institute of Technology 56 Rk 1 2 3 4 5 6 7 8 9 10 Player Pujols Jones Ramirez Bradley Berkman Holliday Teixeira Rodriguez Quentin Youkilis WOA 0.370 0.360 0.345 0.338 0.333 0.326 0.326 0.324 0.322 0.320 GPA Rk OPS Rk 0.371 1 1.114 1 0.355 2 1.044 2 0.344 3 1.031 3 0.337 4 0.999 4 0.331 5 0.986 5 0.319 9 0.947 11 0.323 6 0.962 9 0.320 8 0.965 7 0.320 7 0.965 7 0.318 10 0.958 10 BA Rk HR 0.357 2 37 0.364 1 22 0.332 3 37 0.321 6 22 0.312 11 29 0.321 6 25 0.308 14 33 0.302 27 35 0.288 56 36 0.312 11 29 UP-STAT 2013 @ Rochester Institute of Technology Rk 4 59 4 59 29 41 15 11 9 29 RBI 116 75 121 77 106 88 121 103 100 115 Rk 9 78 6 72 17 50 6 21 26 10 57 NAME TEAM MATSUI NYY SALARY (Y2008) SALARY (Y2004) R H BA HR RBI OBP SLG OPS $13,000,000 $7,000,000 109 174 0.298 31 103 0.390 0.522 0.912 SUZUKI SEA $17,102,149 $6,528,000 101 262 0.372 8 60 0.414 0.455 0.869 SALARY SALARY NAME TEAM (Y2008) (Y2004) SB CS B ISO pSLG nP GPA WOA MATSUI NYY $13,000,000 $7,000,000 3 0 91 0.224 0.251 0.379 0.306 0.311 SUZUKI SEA $17,102,149 $6,528,000 36 11 53 0.082 0.074 0.202 0.300 0.317 Yes! Suzuki is more valuable by a new statistic WOA. UP-STAT 2013 @ Rochester Institute of Technology 58 The positive-definiteness requirement for the covariance matrix may impose complicated nonlinear constraints on the parameters. Kim (Journal of Multivariate Analysis, 2012) generalized the unconstrained model for covariance structure that removes this obstacle to the multivariate longitudinal data to use the Modified Cholesky Block Decomposition, and then to model its parameters parsimoniously. UP-STAT 2013 @ Rochester Institute of Technology 59 Repeated measurements of one attribute observed over time on each of many “subjects” (animals, plants, hospitals, etc.) Within-subject attributes are often correlated. Often the subjects can be classified into groups. Typically want to study how the attribute changes over time and whether the nature of this change is the same across groups. UP-STAT 2013 @ Rochester Institute of Technology 60 Two or more attributes are measured on each subject over time. The methods of univariate longitudinal analysis may suffice – if the focus of the investigation is on the longitudinal behavior of each attribute in isolation from the others. The methods of multivariate longitudinal analysis are needed – if the covariation between the set of measurements on one attribute and the sets of measurements on other attributes is of scientific interest. (e.g. intra-ocular pressure in the left and right eyes) UP-STAT 2013 @ Rochester Institute of Technology 61 Example: Baseball study (Data: ESPN MLB) The attribute data are measurements of salary and WOA (Weighted Offensive Average) for 53 randomly selected MLB batters over years in terms of their experience at the Major. Measurements were observed at the end of each regular season, for their first 6 years at MLB. (Thus, Subject(I)=53, Time(T)=6, Attribute(J)=2 [Salary, WOA]) Maximum likelihood estimates of unstructured variances, correlations, and cross-correlations are displayed revealing some interesting features. We assume the mean vector to be saturated. (Possibly adopt more parsimonious Mean Models.) UP-STAT 2013 @ Rochester Institute of Technology 62 $14,000,000 0.380 0.336 $10,500,000 0.292 $7,000,000 0.248 $3,500,000 0.204 $0 0.160 1 2 3 4 5 6 1 2 UP-STAT 2013 @ Rochester Institute of Technology 3 4 5 6 63 Descriptive Statistics: Salary, WOA Yr Mean Minimum Median Maximum 1 $289,224 $68,000 $170,000 $5,666,667 2 $417,052 $100,000 $242,500 $3,696,000 3 $604,598 $100,000 $372,500 $4,666,667 4 $2,199,802 $225,000 $2,012,500 $7,000,000 5 $3,954,346 $375,000 $3,458,334 $12,500,000 6 $5,847,747 $500,000 $5,000,000 $14,000,000 1 0.260 0.174 0.262 0.354 2 0.273 0.219 0.273 0.320 3 0.290 0.213 0.296 0.363 4 0.295 0.233 0.295 0.349 5 0.296 0.245 0.290 0.372 6 0.295 0.230 0.290 0.376 UP-STAT 2013 @ Rochester Institute of Technology 64 Variance (1/1000) Vaiance (1010) w1 1.1 s1 1 w2 0.8 s2 17 w3 1.2 s3 29 w4 0.9 s4 228 w5 0.9 s5 518 w6 1.1 s6 833 Variances of batter’s WOA and Salary The variance of Salary clearly increase over time, as is typical in growth studies but the variance of WOA rather remains constant. UP-STAT 2013 @ Rochester Institute of Technology 65 Corr w1 w2 w3 w4 w5 w6 w1 1 w2 0.21 1 w3 0.02 0.55 1 w4 0.17 0.57 0.57 1 w5 0.13 0.50 0.44 0.70 1 w6 0.13 0.58 0.48 0.59 0.72 1 Corr s1 s2 s3 s4 s5 s6 s1 1 s2 0.57 1 s3 0.53 0.93 1 s4 0.51 0.48 0.51 1 s5 0.36 0.43 0.45 0.82 1 s6 0.35 0.43 0.42 0.76 0.92 Correlations among WOA Clear evidence of serial corr. among WOA measurements The temporal decay in correlation appears to stop beyond about lag four or five Same-lag corr. increase over time with a somewhat larger than others Correlations among Salary Clear evidence of serial corr. among Salary measurements The temporal decay in corr. appears not to stop even beyond about lag four or five 1 Same-lag corr. increase over time with somewhat larger than others (except for yr2 and yr3) UP-STAT 2013 @ Rochester Institute of Technology 66 Corr S1 s2 s3 s4 s5 S6 w1 0.89 0.58 0.57 0.42 0.42 0.27 w2 0.55 0.86 0.80 0.64 0.58 0.44 w3 0.53 0.78 0.90 0.76 0.69 0.48 w4 0.50 0.70 0.85 0.85 0.81 0.61 w5 0.43 0.62 0.77 0.82 0.83 0.66 w6 0.47 0.59 0.72 0.79 0.81 0.80 Cross-correlations among Batter’s WOA and Salary The cross-correlations corresponding to contemporaneous ones of WOA and Salary increase over time The cross-correlations corresponding to noncontemporaneous ones vary smoothly but not monotonically The cross-correlations are asymmetric Corr (di, hj)>Corr(dj,hi) for i<j Note: To explain the batter’s salary, we may add the player’s popularity, how consistently play (e.g. How many PAs?), the stadium effect, his team’s popularity, the indicators for salary arbitration and free agent in addition to adopt multivariate antedependence models for covariance structure. And we can consider the more parsimonious models for the mean structure rather than the saturated models. UP-STAT 2013 @ Rochester Institute of Technology 67 Bracketology is the process of predicting the field of the NCAA Basketball Tournament. It incorporates some method of predicting what the Selection Committee will use as its Ratings Percentage Index in order to decide 37 at-large teams to complete the field. It seeds the field by ranking all teams from 1st through 68th. Newly designed BPI adds more meaningful variables to predict the Tournament teams and their seeds. We developed how to evaluate brackets provided by RPI, BPI and Sagarin’s. Based on our method, RPI is the best one. Here best one means that the RPI’s bracket is the closest to the real bracket by the NCAA committee. That might mean the committee too much depends on RPI due to its simplicity and popularity. For the future study, we would like to develop an updated model of RPI to become more meaningful but more precise to find the best 68 teams for the NCAA tournament. UP-STAT 2013 @ Rochester Institute of Technology 68 There are many discussions on the BCS rating which eventually determines which two teams play for the championship game. We reviewed the current issues including some comments for adopting “tournament” play-off. We focus on the performance of BCS from 2005 to 2010 and suggest that the new weights 50%, 45% and 5% for Harris poll, USA Today poll, and the Computer rating would perform better than 1/3, 1/3 and 1/3. However we do not see any statistical significance between revised BCS and the current BCS ratings. So we do not urge to ask the BCS adjust its current weights 1/3, 1/3, and 1/3. For the future study, we would like to derive the probability to capture the true champion under the current system and under the potential 8team play-off system. By the comparing those two chances, we may decide whether we can speak the BCS should adopt the play-off or not. UP-STAT 2013 @ Rochester Institute of Technology 69