A modification of OPS: Widely used to measure a baseball batter`s

Transcription

A modification of OPS: Widely used to measure a baseball batter`s
Chulmin Kim, Ph.D.
Assistant Professor of Statistics
School of Mathematical Sciences
Rochester Institute of Technology
Intro.
Example
Studies
• Fields of Statistics
• Intro. to Sports Statistics
• Study 1: Bracket Analysis (NCAA Men’s Basketball tournament)
• Study 2: A modification of OPS –WOA (Major League Baseball)
• Study 3: A Study of BCS Rating (NCAA Div. I FBS Football)
• Conclusion & Discussion
Discussion
UP-STAT 2013 @ Rochester Institute of Technology
2
Actuarial science (Statistics + Insurance industries)
Bioinformatics (Statistics + Computer Science + Biology)
Biostatistics (Statistics + Biological or Medical Sciences)
Business analytics (Statistics + Business)
Chemometrics (Statistics + Chemistry)
Econometrics (Statistics + Economics)
Environmental statistics (Statistics + Environm. Science)
Geostatistics (Statistics + Geology)
Operations research (Statistics + Optimal to Complex)
Psychometrics (Statistics +Psychology)
Quality control (Statistics +Industrial Engineering)
Sports Statistics (Statistics + Sports)
UP-STAT 2013 @ Rochester Institute of Technology
3

The Section on Statistics in Sports (SIS) was
founded during the 1992 Joint Statistical
Meetings, filling a need to foster the development
of statistics and its applications in sports.

The mission of SIS has been to stimulate
statistical research with an application to sports,
promoting publications devoted to statistical
theory and methodology and their application to
statistics in sports, and to increase the availability
of information concerning the science of statistics
and its contribution to sports.
UP-STAT 2013 @ Rochester Institute of Technology
4

Tournament Selection Efficiency: An Analysis of the PGA TOUR's FedExCup

Predicting the Maximum Lead from Final Scores in Basketball: A Diffusion Model

An Exploratory Study of Minor League Baseball Statistics

Pace and Critical Gradient for Hill Runners: An Analysis of Race Records

Models for Third Down Conversion in the National Football League

A Comparison of the Autocorrelation and Variance of NFL Team Strengths Over
Time using a Bayesian State-Space Model

The Sensitivity of College Football Rankings to Several Modeling Choices

The Individual Factors of Successful Free Throw Shooting

New Insights on the Tendency of NCAA Basketball Officials to Even Out Foul Calls

Estimating Fielding Ability in Baseball Players Over Time
UP-STAT 2013 @ Rochester Institute of Technology
5

Establishing the Run (in American Football) -Conventional
Wisdom or Obsolete Adage?

NHL Team Statistics by Nonparametric Statistical Analysis

Plus-or-Minus for NHL players

ROCHESTER INSTITUTE OF TECHNOLOGY Men’s
Basketball team

Paul Pierce

Analysis of Fernando Verdasco’s Tennis Career

Did Inexperienced Referees Impact Football Statistics?
UP-STAT 2013 @ Rochester Institute of Technology
6

Bracket Analysis of the 2013 NCAA
Men’s basketball-Does the RPI work
well?
UP-STAT 2013 @ Rochester Institute of Technology
7
 There are
347 schools in 32 Division I
basketball conferences.
(AVG 10.8 teams/conference
Largest: (16) Atlantic 10
Smallest: (8) Patriot, Ivy League)
 Each conference (except for the
Great West)
receives an automatic bid to the NCAA
Men's Division I Basketball Championship.
UP-STAT 2013 @ Rochester Institute of Technology
8
 Nickname for the NCAA (Men’s)
Basketball Championship
# of Teams in the
field of NCAA (68)
 Single-elimination tournament to
determine the national Championship
Automatic bids
At-large berths
 The field of 68 college basketball teams
Automatic bids (31):
The champions from 31 conferences
At-large berths (37):
Selected by the selection committee
UP-STAT 2013 @ Rochester Institute of Technology
37
31
9
Conference
12
15
10
Teams
UNC, Duke, FSU, Virginia, BC,
NC State, Miami, WF, VT, GT,
Clemson, Maryland (2014),
Syracuse (2013), Pitts (2013),
Notre Dame* (2013),
Louisville (2014)
Syracuse, Marquette,
Notre Dame*, Cincinnati,
Georgetown, USF, Louisville,
Connecticut, Seton Hall,
Rutgers, St. John’s, Pittsburgh,
Villanova, Providence, De Paul
Kansas, Baylor, Iowa State,
Kansas State, Texas,
Oklahoma, Oklahoma State,
Texas Tech, TCU, West Virginia
Conference
12
12
14
UP-STAT 2013 @ Rochester Institute of Technology
Teams
Ohio State, Michigan State,
Michigan, Wisconsin, Indiana,
Purdue, Northwestern, Iowa,
Minnesota, Illinois, Nebraska,
Penn State, Rutgers (2014),
Maryland (2014)
Washington, Oregon,
California, Arizona, Colorado,
UCLA, Stanford, Oregon State,
Washington State,
Arizona State, Utah, USC
Kentucky, Florida, Vanderbilt,
Alabama, Mississippi State,
Ole Miss, Arkansas, Auburn,
Tennessee, Louisiana State,
Georgia, South Carolina,
Missouri, Texas A&M
10
From 16 teams in 2012,
 West Virginia (2012Big 12)
 Pittsburgh, Syracuse (2013ACC)
 Notre Dame* (2013ACC)
New Big East (10 schools, in 2013)
 The Catholic 7 (Villanova, Providence, De Paul, Seton Hall,
Marquette, Georgetown, St. John’s)
 NEW 3: Xavier (A10), Creighton (MVC), Butler (A10)
Renamed Big East (10 schools, in 2013 & 2014, 12 teams in 2015)
(America Athletics Conference? On April 4th)
 Cincinnati, Connecticut, USF
 Louisville (2014ACC), Rutgers (2014Big Ten)
 Houston, Memphis, SMU, Temple, UCF (2013)
 East Carolina, Tulane (2014), Tulsa (2015), Navy* (2015)
UP-STAT 2013 @ Rochester Institute of Technology
11
# of Teams in the
field of NCAA (68)
Automatic bids
Six Power conference
Twenty six non-power
conference
11
26
31
Conf.
At-large/
poss. tms
in Conf.
Winning
Ratio in
the Field
Notables
Big East
7/14 (50%)
65% (11-6)
Louisville, Syracuse
Big 12
4/9 (44%)
38% (3-5)
Kansas, Iowa St.
Big Ten
6/11 (55%)
68% (13-6)
Michigan, Ohio St.
ACC
3/11 (27%)
60% (6-4)
Duke, Florida St.
SEC
2/13 (15%)
57% (4-3)
Florida, Mississippi
Pac 12
4/11 (36%)
50% (5-5)
Arizona, Oregon
MWC
4/8 (50%)
29% (2-5)
SD St., Colorado St.
A 10
4/15 (27%)
58% (7-5)
La Salle, St. Louis
WCC
1/8 (13%)
50% (2-2)
Gonzaga, St. Mary’s
Others
2/224 (1%)
32% (11-23)
Wichita St., Middle Tenn
UP-STAT 2013 @ Rochester Institute of Technology
12
 The
process of predicting the field of the
NCAA Basketball Tournament.
 It
incorporates some method of predicting
what the Selection Committee will use as its
Ratings Percentage Index in order to decide
37 at-large teams to complete the field.
 It
seeds the field by ranking all teams from
1st through 68th.
UP-STAT 2013 @ Rochester Institute of Technology
13
 The inventor
of the term "bracketology",
starting first as the editor of the Blue Ribbon
College Basketball Yearbook and ending up
as the resident bracketologist on ESPN.
 Teaches an
online course at Saint Joseph's
titled "Fundamentals of Bracketology."
 However ranked
only 36th out of 65 bracket
experts in the past four years' results of the
tournament.
UP-STAT 2013 @ Rochester Institute of Technology
14

A quantity used to rank sports teams based upon a team's
wins and losses and its strength of schedule. This system has
been in use in college basketball since 1981 to aid in the
selecting and seeding of teams appearing in the 68-team
men's playoffs.

The RPI rating is often considered a significant factor in
selecting and seeding the final few teams in the tournament
field, though the selection committee stresses that the RPI is
used merely as a guideline and not as an infallible indicator of
a team's worth.

In its current formulation, the index comprises a team's
winning percentage (25%), its opponents' winning percentage
(50%), and the winning percentage of those opponents'
opponents (25%).
UP-STAT 2013 @ Rochester Institute of Technology
15

The opponents' winning percentage and the winning
percentage of those opponents' opponents both comprise the
strength of schedule (SOS). Thus, the SOS accounts for 75% of
the RPI calculation and is 2/3 its opponents' winning
percentage and 1/3 times its opponents' opponents' winning
percentage.

The RPI lacks theoretical justification from a statistical
standpoint. Other ranking systems which include the margin of
victory of games played or other statistics in addition to the
win/loss results have been shown to be a better predictor of the
outcomes of future games. However, because the margin of
victory has been manipulated in the past by teams or
individuals in the context of gambling, the RPI can be used to
mitigate motivation for such manipulation.
UP-STAT 2013 @ Rochester Institute of Technology
16

Part I (25% of the formula): Team winning percentage.
For the 2005 season, the NCAA added a bonus/penalty
system, where each home win or road loss get multiplied by
0.6 in the winning percentage calculation. A home loss or
road win is multiplied by 1.4. Neutral games count as 1.0.

Part II (50%): Average opponents’ winning percentage.
To calculate this, you must calculate each opponent’s
winning percentage individually and average those figures.
Games involving the team for whom we are calculating the
RPI are ignored.

Part III (25%): Average opponents’ opponents’ winning
percentage: Basically taking all of the opponents’ Part II
values and averaging them.
[Source: kenpom.com]
UP-STAT 2013 @ Rochester Institute of Technology
17
Home
Score
Away
Score
Memphis
78
North Carolina
72
Memphis
82
Maryland
68
Wake Forest
71
Memphis
72
North Carolina
93
Memphis
68
Maryland
81
Wake Forest
72
Wake Forest
52
North Carolina
68
R
Team
W-L
W%
(H)
(A)
1
Memphis
3-1
.750
2-0
1-1
2
North Carolina
2-1
.667
1-0
1-1
3
Maryland
1-1
.500
1-0
0-1
4
Wake Forest
0-3
.000
0-2
0-1
UP-STAT 2013 @ Rochester Institute of Technology
18
Home
Score
Away
Score
H
Memphis
78
North Carolina
72
Mem 0.6/0.6 UNC 0/0.6
Memphis
82
Maryland
68
Mem 0.6/0.6 Mary 0/0.6
Wake Forest
71
Memphis
72
WF
North Carolina
93
Memphis
68
UNC 0.6/0.6 Mem 0/0.6
Maryland
81
Wake Forest
72
Mary 0.6/0.6 WF
Wake Forest
52
North Carolina
68
WF
H(%)
0/1.4
0/1.4
A
A(%)
Mem 1.4/1.4
0/0.6
UNC 1.4/1.4
 WP
Memphis: (0.6 + 0.6 + 1.4 + 0) / (0.6 + 0.6 + 1.4 + 0.6) = 0.8125
North Carolina: (0 + 0.6 + 1.4) / (0.6 + 0.6 + 1.4) = 0.7692
Maryland: (0 + 0.6) / (0.6 + 0.6) = 0.5000
Wake Forest: (0 + 0 + 0) / (1.4 + 0.6 + 1.4) = 0.0000
UP-STAT 2013 @ Rochester Institute of Technology
19
Team
Oponent 1
Oponent 2
Oponent 3
Oponent 4
OWP
UNC 1.0 (1-0)
0.7500
Mem
UNC 1.0 (1-0)
Mary 1.0 (1-0)
WF 0.0 (0-2)
UNC
Mem 1.0 (2-0)
Mem 1.0 (2-0)
WF 0.0 (0-2)
Mary
Mem 0.6667 (2-1)
WF 0.0 (0-2)
WF
Mem 0.6667 (2-1)
Mary 0.0 (0-1)
Team
Team
0.6667
0.3333
UNC 0.5 (1-1)
0.3889
Oponent 1
Oponent 2
Oponent 3
Oponent 4
OOWP
Mem
UNC 0.6667
Mary 0.3333
WF 0.3889
UNC 0.6667
0.5139
UNC
Mem 0.7500
Mem 0.7500
WF 0.3889
Mary
Mem 0.7500
WF 0.3889
WF
Mem 0.7500
Mary 0.3333
0.6296
0.5694
UNC 0.6667
0.5833
WP (25%)
OWP (50%)
OOWP (25%)
SOS
RPI
W%
Mem
0.8125
0.7500
0.5139
0.6713
0.7066
0.7500 (3-1)
UNC
0.7692
0.6667
0.6296
0.6543
0.6830
0.6667 (2-1)
Mary
0.5000
0.3333
0.5694
0.4120
0.4340
0.5000 (1-1)
WF
0.0000
0.3889
0.5833Institute0.4537
0.3403
UP-STAT 2013 @ Rochester
of Technology
0.0000 (0-3)
20
Conf
Mountain West
Big Ten
Big East
ACC
Big 12
Pac 12
A-10
SEC
Missouri Valley
West Coast
RPI SOS N
0.580 0.570 9
0.577 0.573 12
0.574 0.567 15
0.561 0.557 12
0.558 0.558 10
0.555 0.555 12
0.548 0.547 16

0.542 0.541 14
0.534 0.533 10
0.529 0.527 9
Some argue that the heavy emphasis
on SOS gives an unfair advantage to
teams from power conferences. Teams
from "majors" are allowed to pick
many of their non-conference
opponents (Power: 1/2.7 teams,
MWC,A10,WCC: 1/3.4, Other:1/112.)
Some mid-major compel their teams
to schedule opponents ranked in the
top half of the RPI, which could boost
the strength of that conference and/or
its tougher-scheduling teams.
UP-STAT 2013 @ Rochester Institute of Technology
21
 ESPN created
the BPI. Other power rankings
already exist. (the RPI, Sagarin’s Ratings,
and Ken Pomeroy's ratings, etc.)
 All of
these methods are based upon the
outcomes of games, their location
(home/road/neutral) and the quality of
opponents. Each one basically puts these
together in slightly different ways and
arrives at slightly different results.
UP-STAT 2013 @ Rochester Institute of Technology
22

RPI, due to its simplicity, tends to be the biggest
decision aid for the tournament committee, even
though it doesn't account for the actual scores of
games.

BPI added a way of accounting for missing
players. If a team or its opponent is missing one
of its most important players for a contest, that
game is less important for ranking the teams
compared to games in which both teams are at
full strength.
UP-STAT 2013 @ Rochester Institute of Technology
23
 Another way
that BPI can rank teams differently
than Sagarin or Kenpom is counting close games at
home versus on the road. In BPI, a close win at
home is better than a close loss on the road
against the same opponent. This isn't necessarily
true in other methods.
 Other methods don't
typically account for bigger
wins. BPI gives marginally decreasing credit for
bigger wins, with a 30-point win being only about
20 percent better than a 15-point win, not twice as
good, which can happen in other methods.
UP-STAT 2013 @ Rochester Institute of Technology
24
Includes
Scoring margin
Diminishing returns for blowouts
Pace of game matters
Home/Neutral/Road
SOS beyond Opponent's opponents' W-L
All wins are better than losses
De-weighting games with missing key players
RPI Sagarin’s BPI
X
O
O
X
O
O
X
X
O
O
O
O
X
O
O
O
X
O
X
X
O
Between the 2007 and 2011 NCAA tournaments, it picked 74.4 percent of the matchups
correctly, whereas Sagarin’s picked 73.2 percent and RPI picked 71.9 percent.
The BPI is not a guaranteed way to pick a perfect bracket, but we do think it is the best
power ranking available. [Source: ESPN.com]
UP-STAT 2013 @ Rochester Institute of Technology
25

Which works better to predict the 2012 & 2013 NCAA
bracket? RPI or BPI? Or Sagarin’s?

To decide it, we evaluate six things:
Correctly
Picked
Correctly
Seeded
Correctly
Seeded
Within 1
Correctly
Seeded
Within 2
+ 5 pts
+ 3 pts + 2 pts + 1 pt
Missed
but in
top 41-51
Missed
but in
top 40
- 1 pt
- 2 pts
Note: This analysis is nothing to do with who is the
best at picking winners in the NCAA tournament. This
is all about predicting the participants in the field of
the tournament and how well seeded them.
UP-STAT 2013 @ Rochester Institute of Technology
26

RPI predicts 91.9% (34/37) correctly.
(Missed Virginia, Iona, West Virginia)

BPI predicts 86.5% (32/37) correctly.
(Missed Iona, Xavier, Notre Dame,
South Florida, Colorado State)
 Sagarin’s
predicts 86.5% (32/37) correctly.
(Xavier, Iona, Southern Mississippi,
South Florida, Colorado State)
UP-STAT 2013 @ Rochester Institute of Technology
27
Seed Bs Rs Ss Auto
Team
Bd Rd Sd
1 1 1 1 no
Kentucky (32-2)
0 0 0
1 1 1 2 no
Syracuse (31-2)
0 0 1
1 2 1 2 no
North Carolina (29-5)
1 0 1
1 1 1 1 Yes Michigan St (27-7)
0 0 0
2 2 2 1 no
Kansas (27-6)
0 0 1
2 2 2 3 no
Duke (27-6)
0 0 1
2 1 2 1 no
Ohio State (27-7)
1 0 1
2 2 3 2 Yes Missouri (30-4)
0 1 0
3 3 2 3 no
Baylor (26-7)
0 1 0
3 5 3 4 no
Marquette (25-7)
2 0 1
3 7 3 5 Yes Florida St (24-9)
4 0 2
3 5 4 4 no
Georgetown (22-8)
2 1 1
4 8 3 6 no
Michigan (23-9)
4 1 2
4 4 6 2 no
Wisconsin (24-9)
0 2 2
4 4 5 3 no
Indiana (25-8)
0 1 1
UP-STAT 2013 @ Rochester Institute of Technology
28
BPI
#37 (ArizonaNIT#1[32])
#41 (MiamiNIT#2[16])
#48 (La SalleNIT#3[32])
#50 (Seton HallNIT#1[16])
#51 (NorthwesternNIT#4[16])
 RPI
#42 (Oral RobertsNIT#4[32])
#44 (MarshallNIT#5[32])
#51 (Middle TennesseeNIT#4[8])
 Sagarin’s
#45 (MiamiNIT#2[16])
#47 (ArizonaNIT#1[32])
#48 (Stanford NIT#3[Champion])
#49 (Middle Tennessee NIT#4[8])
#51 (Minnesota NIT#6[Runner-up)

UP-STAT 2013 @ Rochester Institute of Technology
29








Iona (25-8, MAACLost in First Four) [#52, #54, #58]
Xavier (21-12, A10Sweet 16) [#53, #43, #57]
Colorado St. (20-11, MWCLost 1st Rd.) [#70, #30, #74]
USF (19-13, Big EastLost 2nd Rd.) [#64, #48, #69]
Notre Dame (22-11, Big EastLost 1st Rd.) [#60, #37, #42]
Virginia (22-9, ACCLost 1st Rd.) [#27, #53, #25]
W. Virginia (19-13, Big EastLost 1st Rd.) [#42, #67, #39]
S. Mississippi (23-8, C-USALost 1st Rd.) [#46, #22, #60]
Note that:
The ranks are BPI, RPI, Sagarin’s respectively.
BPI and Sagarin’s mostly accord but not RPI.
UP-STAT 2013 @ Rochester Institute of Technology
30
Method
RPI
BPI
SAG
Correctly
Picked
Correctly
Seeded
+ 5 pts
65
63
63
Correctly
Seeded
Within 1
Missed
but in
top 41-51
Missed
but in
top 40
+ 3 pts + 2 pts + 1 pt
- 1 pt
- 2 pts
24
22
13
3
4
5
0
1
0
24
17
27
Correctly
Seeded
Within 2
6
7
9
UP-STAT 2013 @ Rochester Institute of Technology
31
 The winner based
on our method is:
RPI: 448 pts.
BPI: 416 pts.
Sagarin’s: 412 pts.
UP-STAT 2013 @ Rochester Institute of Technology
32
 Champion: Kentucky [#1 seed, #2, #1, #1]
 Runner-up: Kansas [#2 seed, #6, #6, #4]
 Final Four: Ohio St. [#2 seed, #7, #3, #2]
 Final Four: Louisville [#4 seed, #13, #11, #18]
 Elite Eight: Syracuse [#1 seed, #1, #2, #6]
 Elite Eight: UNC [#1 seed, #3, #5, #5]
 Elite Eight: Baylor [#3 seed, #8, #12, #12]
 Elite Eight: Florida [#7 seed, #29, #16, #17]
Elite Eights are ALL from Power Conferences.
Winning rate of higher seed = 73% (46/63)
Women’s Final Four are ALL #1 Seeds!!!
UP-STAT 2013 @ Rochester Institute of Technology
33

RPI predicts 97.3% (36/37) correctly.
(Missed only California)

BPI predicts 86.5% (32/37) correctly.
(Missed California, La Salle, Temple,
Villanova, Illinois)
 Sagarin’s
predicts 83.8% (31/37) correctly.
(Colorado, Middle Tennessee, Temple,
California, La Salle, Boise State)
UP-STAT 2013 @ Rochester Institute of Technology
34
Seed Bs Rs Ss Auto
1 1 1 1 Yes
1 2 2 1 Yes
1 1 2 1 No
1 2 2 2 Yes
2 3 1 4 Yes
2 1 1 2 No
2 4 3 3 No
2 2 3 2 Yes
3 3 1 5 Yes
3 1 2 1 No
3 3 3 2 No
3 5 3 6 No
4 2 5 3 No
4 8 6 5 No
4 5 4 4 Yes
School
Record Bd Rd Sd
Louisville
29–5
0 0 0
Kansas
29–5
1 1 0
Indiana
27–6
0 1 0
Gonzaga
31–2
1 1 1
Miami
27–6
1 1 2
Duke
27–5
1 1 0
Georgetown
25–6
2 1 1
Ohio State
26–7
0 1 0
New Mexico
29–5
0 2 2
Florida
26–7
2 1 2
Michigan State 25–8
0 0 1
Marquette
23–8
2 0 3
Michigan
26–7
2 1 1
Kansas State
27–7
4 2 1
Saint Louis
27–6
1 0 0
UP-STAT 2013 @ Rochester Institute of Technology
35
 BPI
#41 (Stanford  NIT#4[16])
#43 (Iowa  NIT#3[Runner-up])
#46 (Baylor  NIT#2[Champion])
#48 (Virginia  NIT#1[8])
#50 (Maryland  NIT#2[4])
 RPI
#34 (Southern MississippiNIT#1[8]),
 Sagarin’s
#29 (Iowa  NIT#3[Runner-up])
#31 (Baylor  NIT#2[Champion])
#37 (Kentucky  NIT#1[32])
#38 (Virginia  NIT#1[8])
#48 (Stanford  NIT#4[16])
#49 (Maryland  NIT#2[4])
UP-STAT 2013 @ Rochester Institute of Technology
36







California (20-11, Pac 12 Lost 2nd Rd.) [#53, #53, #56]
Temple (23-9, A10Lost 2nd Rd.) [#56, #41, #55]
La Salle (21-9, A10Sweet 16) [#54, #40, #57]
Illinois (22-12, Big Ten Lost 2nd Rd.) [#63, #39, #44]
Colorado (21-11, Pac 12Lost 1st Rd.) [#39, #37, #51]
MTSU(28-5, Sun BeltLost 1st Rd.) [#45, #28, #53]
Boise St. (21-10, MWCLost in First 4) [#44, #44, #58]
Note that:
The ranks are BPI, RPI, Sagarin’s respectively.
BPI and Sagarin’s mostly accord but not RPI.
UP-STAT 2013 @ Rochester Institute of Technology
37
Method
RPI
BPI
SAG
Correctly
Picked
Correctly
Seeded
+ 5 pts
67
63
62
Correctly
Seeded
Within 1
Missed
but in
top 41-51
Missed
but in
top 40
+ 3 pts + 2 pts + 1 pt
- 1 pt
- 2 pts
25
21
18
0
5
2
1
0
4
21
19
25
Correctly
Seeded
Within 2
12
14
9
UP-STAT 2013 @ Rochester Institute of Technology
38
 The winner based
on our method is:
RPI: 462 pts.
BPI: 425 pts.
Sagarin’s: 413 pts.
UP-STAT 2013 @ Rochester Institute of Technology
39
544
448
462
416
425
412 413
408
Year
RPI
BPI
Sagarin’s
2012
448
416
412
2013
462
425
413
2012
272
2013
136
The selection committee
heavily depends on RPI.
The tendency is even
becoming stronger!
0
RPI
BPI
Sagarin's
UP-STAT 2013 @ Rochester Institute of Technology
40
 Final Four: Louisville [#1 seed, #3, #1, #1]
 Final Four: Syracuse [#4 seed, #13, #11, #13]
 Final Four: Michigan [#4 seed, #17, #8, #11]
 Final Four: Wichita St. [#9 seed, #38, #24, #42]
 Elite Eight: Duke [#2 seed, #1, #3, #7]
 Elite Eight: Ohio St. [#2 seed, #11, #7, #5]
 Elite Eight: Florida [#3 seed, #8, #2, #3]
 Elite Eight: Marquette [#3 seed, #12, #20, #22]
Elite Eights are ALL but Wichita State (MVC)
from the Power Conferences.
Winning rate of higher seed = 67% (40/60)
UP-STAT 2013 @ Rochester Institute of Technology
41
1.000
1.000
Rd 1 Win (%) by seed since 1985
0.952
0.833
0.798
0.750
0.6790.702
0.607
0.548
0.500
0.452
0.393
0.2980.321
0.250
0.202
0.167
0.048
0.000
0.000
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16
UP-STAT 2013 @ Rochester Institute of Technology
42
Overall Win (%) by seed since 1985
1.000
0.785
0.750
0.500
0.703
0.639
0.582
0.5370.553
0.454
0.430
0.396
0.358
0.337 0.316
0.250
0.206
0.138
0.051
0.000
0.000
#1 #2 #3 #4 #5 #6 #7 #8 #9 #10 #11 #12 #13 #14 #15 #16
UP-STAT 2013 @ Rochester Institute of Technology
43

How many brackets are possible?

Since for the 64-team tournament, there
will be 63 games. And each game has only
two possible outcomes (Win/Lose).

There are 263=9,223,372,036,854,775,808
possible brackets (9.2 quintillion)
UP-STAT 2013 @ Rochester Institute of Technology
44

Sports writer R. J. Bell says about this number:
If one bracket per second was filled out, it would take
292 billion years to fill out all possible brackets
(that's 20 times longer than the universe has
existed).
If all the people on earth filled out one bracket per
second, it would take over 43 years to fill out every
possible bracket.
If all possible brackets were stacked on top of each
other (on standard paper), the pile would reach from
the moon and back over 1.1 million times.
UP-STAT 2013 @ Rochester Institute of Technology
45

How many brackets correctly picked up the
current Final Four?

Yahoo.com  Out of 966,063, only 9
(<.001%, or one out of every 107,340)
Louisville (29%), Michigan (1.9%),
Syracuse (1.1%), Wichita St. (.017%)

ESPN.com  Out of 8.15mil., only 47
(<.001%, or one out of every 173,404)
Wichita St. (.019%)
UP-STAT 2013 @ Rochester Institute of Technology
46

In the modern era (1985-present with 64 team tournament), 2011
was the worst predictable year. 2013 is even worse!

In 2011, [68.3%=43/63 higher seeds win %] (In 2012, 73.0%=46/63]
VCU (#11 advanced to the Final Four)
Butler (#8 advanced to the Championship)
Elite 8 contained only one #1 (Kansas)
Final 4 contained NO #1 (#3, #4, #8, #11)

In 2013, [66.7% s=40/60 so far higher seeds win %]
Florida Gulf Coast [1997-first class has been held]
(First ever #15 advanced to the Sweet 16/Atlantic Sun: 24-10/ Head
Coach Andy Enfield already hired by USC on Apr. 1)
Wichita St. (#9 advanced to the Final Four)
Elite 8 contained only one #1 (Louisville)
Final 4 contained NO #1 (#1, #4, #4, #9)
In the modern era, #1 has 91% ratio vs. #9 (60-6). No #4 vs. #4.
UP-STAT 2013 @ Rochester Institute of Technology
47
Semifinal #1
#9. Wichita State
(30-8, MVC)
6:09 pm
Atlanta
Semifinal #2
#1. Louisville
(33-5, Big East)
#4. Syracuse
(30-9, Big East)
8:49 pm
Atlanta
#4. Michigan
(30-7, Big Ten)
W, #8 Pitts
W, #1 Gonzaga
W, #13 La Salle
W, #2 Ohio St.
W, #16 NC A&T
W, #8 Colorado St.
W, #12 Oregon
W, #2 Duke
W, #13 Montana
W, #12 California
W, #1 Indiana
W, #3 Marquette
W, #13 South Dakota St.
W, #5 VCU
W, #1 Kansas
W, #3 Florida
The 2013 NCAA Championship (9pm on Monday)
UP-STAT 2013 @ Rochester Institute of Technology
48

WOA (Weighted Offensive Average),
A modification of OPS: Widely used to
measure a baseball batter’s performance?
UP-STAT 2013 @ Rochester Institute of Technology
49
Batting average (BA), home runs (HRs), and runs
batted in (RBIs) have been the most dominant
statistics to measure a baseball batter's
performance.
 Slugging percentage (SLG) and on-base
percentage (OBP) have been used as alternatives
of the traditional three statistics.
 SLG measures how often a batter hits and how
valuable the hits are and OBP measures how often
a batter reaches bases. Whereas SLG ignores
reaching bases by hits by pitched ball or walks,
OBP is limited to measure the quality of the hits.

UP-STAT 2013 @ Rochester Institute of Technology
50

A combination of these two is called OPS, the sum of
OBP and SLG, which has become more widely used.

Kim (2013) introduced a variation of OPS, WOA
(weighted offensive average), which is a single
number explaining not only a batter's hitting
performance but also his non-hitting performance to
generate runs for his team such as stolen bases,
walks, and etc. This newly developed statistic is
based on major league team statistics from the year
2000 to the year 2008.
UP-STAT 2013 @ Rochester Institute of Technology
51
Japanese league
Major league
Player Hits BA HR RBIs Hits BA HR RBIs
Ichiro 1278 .353 118 529 2597 .322 104 656
Matsui 1390 .304 332 889 1253 .282 175 760
Salary
Salary
Hits BA HR RBIs OBP SLG OPS
(2008)
(2004)
Ichiro $17 mil. $6.5 mil. 262 0.372
8
60 0.414 0.455 0.869
Matsui $13 mil. $7 mil. 174 0.298 31 103 0.390 0.522 0.912
Player
Descriptive career statistics for Ichiro and Matsui in Japan and in major league. (top)
Descriptive baseball statistics for Ichiro and Matsui in the major league in 2004 and salary in
2004 and 2008. Note: The player with bold is better in the category in 2004. (bottom)
UP-STAT 2013 @ Rochester Institute of Technology
52
Table 5: The comparison of several batting statistics. The newly developed statistic WOA with bold is able to measure all of these categories and its scale is simila
Statistics Accuracy
Power
Reaching Bases
Running Q1
Q3
by non-hits
X
X
.261 .297
BA
O
X
SLG
O
O
X
X
.406
.506
ISO
X
O
X
X
.134
.226
pSLG
X
O
X
X
.162
.267
OBP
O
X
O
X
.328
.373
OPS
O
O
O
X
.743
.874
GPA
O
O
O
X
.253
.292
WOA
O
O
O
O
.263 .298
The comparison of several batting statistics. The newly developed statistic WOA
with bold is able to measure all of these categories and its scale is similar to BA.
UP-STAT 2013 @ Rochester Institute of Technology
53

Since SLG and OBP are highly correlated (r=.75), there
exists “multicollinearity”. Kutner (2004) says in his
book: “The simple interpretation of regression coefficients
is often unwarranted with highly correlated explanatory
variables”. Since BA is a component of both OBP and SLG,
we may want break OBP and SLG down into BA and some
non-BA part. And hopefully they are not highly-correlated
to avoid the effects of “multicollinearity”.

Let us define reaching bases by non-hitting performance
(nP) measures a batter’s ability to reach the bases by nonhits such as BB, HBP, or SB. It is obtained by 2.8*(B%CS%) + SB%, where B%=(BB+HBP)/PA, CS%=CS/PA and
SB%=SB/PA. Here the weights 2.8 come from the linear
weights in the regression equation of RPG on a team’s B%,
CS%, and SB%.
UP-STAT 2013 @ Rochester Institute of Technology
54

Since OBP≈BA+B% and SLG=BA+ISO, we may
consider the regression of RPG on BA, ISO, and
B%. Since pSLG gives a similar interpretation with
ISO (r=.94) and nP gives a similar interpretation
with B% (r=.92). pSLG and nP would replace by
ISO and B%. Another advantage to use them as
explanatory variables along with BA to explain RPG
is because they have much smaller correlation with
BA (r=.02 for pSLG and r=.10 for nP).

The formula for WOA is given by WOA = (8*BA +
2*pSLG + nP)/10.5. Then the new regression is
given by RPG = -7.12 + 44.8*WOA with R2=90.9%.
UP-STAT 2013 @ Rochester Institute of Technology
55
RPG vs.
pSLG
nP
ISO
BA
OBP
SLG
OPS
GPA
WOA
TOT
0.491
0.503
0.728
0.786
0.882
0.895
0.946
0.951
0.953
AL
0.519
0.626
0.729
0.766
0.898
0.886
0.947
0.955
0.957
NL
0.545
0.522
0.763
0.767
0.878
0.903
0.952
0.956
0.956
2000
0.398
0.452
0.684
0.797
0.942
0.870
0.929
0.946
0.952
2001
0.552
0.564
0.762
0.843
0.926
0.879
0.934
0.947
0.955
2002
0.454
0.438
0.736
0.830
0.836
0.915
0.935
0.929
0.936
RPG vs.
pSLG
nP
ISO
BA
OBP
SLG
OPS
GPA
WOA
2003
0.661
0.586
0.853
0.889
0.916
0.952
0.974
0.973
0.977
2004
0.582
0.501
0.781
0.803
0.875
0.928
0.970
0.970
0.970
2005
0.539
0.466
0.672
0.705
0.783
0.789
0.879
0.894
0.895
2006
0.359
0.337
0.634
0.671
0.799
0.854
0.934
0.933
0.934
2007
0.277
0.438
0.584
0.764
0.875
0.886
0.951
0.958
0.959
2008
0.479
0.505
0.698
0.677
0.837
0.905
0.945
0.943
0.949
The correlation comparison between RPG and pSLG, nP, ISO, BA, OBP, SLG, OPS,
GPA, and WOA by year and by league. Bold represents the statistic with the
highest correlation among them (left)
The comparative box-plots of RPG and FRPG (fitted RPG on WOA) by year (right)
UP-STAT 2013 @ Rochester Institute of Technology
56
Rk
1
2
3
4
5
6
7
8
9
10
Player
Pujols
Jones
Ramirez
Bradley
Berkman
Holliday
Teixeira
Rodriguez
Quentin
Youkilis
WOA
0.370
0.360
0.345
0.338
0.333
0.326
0.326
0.324
0.322
0.320
GPA Rk OPS Rk
0.371
1 1.114
1
0.355 2 1.044 2
0.344 3 1.031 3
0.337 4 0.999 4
0.331 5 0.986 5
0.319 9 0.947 11
0.323 6 0.962 9
0.320 8 0.965
7
0.320
7 0.965
7
0.318 10 0.958 10
BA Rk HR
0.357 2 37
0.364
1 22
0.332 3 37
0.321 6 22
0.312 11 29
0.321 6 25
0.308 14 33
0.302 27 35
0.288 56 36
0.312 11 29
UP-STAT 2013 @ Rochester Institute of Technology
Rk
4
59
4
59
29
41
15
11
9
29
RBI
116
75
121
77
106
88
121
103
100
115
Rk
9
78
6
72
17
50
6
21
26
10
57
NAME TEAM
MATSUI NYY
SALARY
(Y2008)
SALARY
(Y2004)
R
H
BA
HR RBI OBP
SLG
OPS
$13,000,000 $7,000,000 109 174 0.298 31 103 0.390 0.522 0.912
SUZUKI
SEA $17,102,149 $6,528,000 101 262 0.372 8 60 0.414 0.455 0.869
SALARY
SALARY
NAME TEAM (Y2008)
(Y2004) SB CS B ISO pSLG nP GPA WOA
MATSUI NYY
$13,000,000 $7,000,000 3 0 91 0.224 0.251 0.379 0.306 0.311
SUZUKI SEA $17,102,149 $6,528,000 36 11 53 0.082 0.074 0.202 0.300 0.317
Yes! Suzuki is more valuable by a new statistic WOA.
UP-STAT 2013 @ Rochester Institute of Technology
58

The positive-definiteness requirement for the
covariance matrix may impose complicated
nonlinear constraints on the parameters.

Kim (Journal of Multivariate Analysis, 2012)
generalized the unconstrained model for
covariance structure that removes this obstacle
to the multivariate longitudinal data to use the
Modified Cholesky Block Decomposition, and
then to model its parameters parsimoniously.
UP-STAT 2013 @ Rochester Institute of Technology
59

Repeated measurements of one attribute
observed over time on each of many “subjects”
(animals, plants, hospitals, etc.)

Within-subject attributes are often correlated.
Often the subjects can be classified into groups.

Typically want to study how the attribute
changes over time and whether the nature of
this change is the same across groups.
UP-STAT 2013 @ Rochester Institute of Technology
60

Two or more attributes are measured on each subject over
time.

The methods of univariate longitudinal analysis may suffice
– if the focus of the investigation is on the longitudinal
behavior of each attribute in isolation from the others.

The methods of multivariate longitudinal analysis are
needed – if the covariation between the set of measurements
on one attribute and the sets of measurements on other
attributes is of scientific interest.
(e.g. intra-ocular pressure in the left and right eyes)
UP-STAT 2013 @ Rochester Institute of Technology
61

Example: Baseball study (Data: ESPN MLB)
The attribute data are measurements of salary and WOA
(Weighted Offensive Average) for 53 randomly selected
MLB batters over years in terms of their experience at the
Major. Measurements were observed at the end of each
regular season, for their first 6 years at MLB. (Thus,
Subject(I)=53, Time(T)=6, Attribute(J)=2 [Salary, WOA])
Maximum likelihood estimates of unstructured variances,
correlations, and cross-correlations are displayed revealing
some interesting features.
We assume the mean vector to be saturated. (Possibly adopt
more parsimonious Mean Models.)
UP-STAT 2013 @ Rochester Institute of Technology
62
$14,000,000
0.380
0.336
$10,500,000
0.292
$7,000,000
0.248
$3,500,000
0.204
$0
0.160
1
2
3
4
5
6
1
2
UP-STAT 2013 @ Rochester Institute of Technology
3
4
5
6
63
Descriptive
Statistics:
Salary, WOA
Yr Mean
Minimum Median
Maximum
1
$289,224 $68,000
$170,000
$5,666,667
2
$417,052 $100,000
$242,500 $3,696,000
3
$604,598 $100,000
$372,500
$4,666,667
4 $2,199,802 $225,000 $2,012,500 $7,000,000
5 $3,954,346 $375,000 $3,458,334 $12,500,000
6 $5,847,747 $500,000 $5,000,000 $14,000,000
1
0.260
0.174
0.262
0.354
2
0.273
0.219
0.273
0.320
3
0.290
0.213
0.296
0.363
4
0.295
0.233
0.295
0.349
5
0.296
0.245
0.290
0.372
6
0.295
0.230
0.290
0.376
UP-STAT 2013 @ Rochester Institute of Technology
64
Variance
(1/1000)
Vaiance
(1010)
w1
1.1
s1
1
w2
0.8
s2
17
w3
1.2
s3
29
w4
0.9
s4
228
w5
0.9
s5
518
w6
1.1
s6
833
Variances of batter’s
WOA and Salary
 The variance of Salary
clearly increase over
time, as is typical in
growth studies but the
variance of WOA rather
remains constant.
UP-STAT 2013 @ Rochester Institute of Technology
65
Corr
w1
w2
w3
w4
w5
w6
w1
1
w2
0.21
1
w3
0.02
0.55
1
w4
0.17
0.57
0.57
1
w5
0.13
0.50
0.44
0.70
1
w6
0.13
0.58
0.48
0.59
0.72
1
Corr
s1
s2
s3
s4
s5
s6
s1
1
s2
0.57
1
s3
0.53
0.93
1
s4
0.51
0.48
0.51
1
s5
0.36
0.43
0.45
0.82
1
s6
0.35
0.43
0.42
0.76
0.92
Correlations among WOA
 Clear evidence of serial corr. among WOA
measurements
The temporal decay in correlation appears to stop
beyond about lag four or five
Same-lag corr. increase over time with a somewhat
larger than others
Correlations among Salary
 Clear evidence of serial corr. among Salary
measurements
 The temporal decay in corr. appears not to stop
even beyond about lag four or five
1
 Same-lag corr. increase over time with somewhat
larger than others (except for yr2 and yr3)
UP-STAT 2013 @ Rochester Institute of Technology
66
Corr
S1
s2
s3
s4
s5
S6
w1
0.89
0.58
0.57
0.42
0.42
0.27
w2
0.55
0.86
0.80
0.64
0.58
0.44
w3
0.53
0.78
0.90
0.76
0.69
0.48
w4
0.50
0.70
0.85
0.85
0.81
0.61
w5
0.43
0.62
0.77
0.82
0.83
0.66
w6
0.47
0.59
0.72
0.79
0.81
0.80
Cross-correlations among Batter’s WOA and Salary
 The cross-correlations corresponding to
contemporaneous ones of WOA and Salary increase
over time
 The cross-correlations corresponding to noncontemporaneous ones vary smoothly but not
monotonically
The cross-correlations are asymmetric
Corr (di, hj)>Corr(dj,hi) for i<j
Note: To explain the batter’s salary, we may add the player’s popularity, how
consistently play (e.g. How many PAs?), the stadium effect, his team’s popularity,
the indicators for salary arbitration and free agent in addition to adopt
multivariate antedependence models for covariance structure. And we can
consider the more parsimonious models for the mean structure rather than the
saturated models.
UP-STAT 2013 @ Rochester Institute of Technology
67

Bracketology is the process of predicting the field of the NCAA Basketball
Tournament.

It incorporates some method of predicting what the Selection Committee
will use as its Ratings Percentage Index in order to decide 37 at-large
teams to complete the field. It seeds the field by ranking all teams from 1st
through 68th.
Newly designed BPI adds more meaningful variables to predict the
Tournament teams and their seeds.

We developed how to evaluate brackets provided by RPI, BPI and
Sagarin’s. Based on our method, RPI is the best one. Here best one means
that the RPI’s bracket is the closest to the real bracket by the NCAA
committee. That might mean the committee too much depends on RPI due
to its simplicity and popularity.

For the future study, we would like to develop an updated model of RPI to
become more meaningful but more precise to find the best 68 teams for
the NCAA tournament.
UP-STAT 2013 @ Rochester Institute of Technology
68

There are many discussions on the BCS rating which eventually
determines which two teams play for the championship game.

We reviewed the current issues including some comments for adopting
“tournament” play-off.

We focus on the performance of BCS from 2005 to 2010 and suggest
that the new weights 50%, 45% and 5% for Harris poll, USA Today poll,
and the Computer rating would perform better than 1/3, 1/3 and 1/3.

However we do not see any statistical significance between revised BCS
and the current BCS ratings. So we do not urge to ask the BCS adjust its
current weights 1/3, 1/3, and 1/3.

For the future study, we would like to derive the probability to capture
the true champion under the current system and under the potential 8team play-off system. By the comparing those two chances, we may
decide whether we can speak the BCS should adopt the play-off or not.
UP-STAT 2013 @ Rochester Institute of Technology
69