View a timeline of Watson`s milestones

Transcription

View a timeline of Watson`s milestones
November 2010
February 2010
First system featuring an expanded
knowledge corpus. The knowledge
corpus is the external information
Watson is exploring to find answers
to questions. The unstructured text
corpus moved from 24 GB to 58 GB.
SPEED: DeepQA averages
less than 3 seconds/clue, the final
speed for the competition.
First live test of the
Watson Avatar.
(8 million pages of text to 18 million; or 37,000 books to 90,000 books)
November 2010
100
90
Round 2 sparring establishes that
Watson can more than compete
with champion-level Jeopardy!
players. (Watson’s record: 39-8-8, 71% wins)
%
%
v0.7 04/10
80
%
v0.610/09
Precision
%
October 2009
DeepQA v0.6
60
%
v0.4 12/08
50
v0.3 08/08
%
v0.2 05/08
40
%
v0.1 12/07
30
%
100
90
20
80
10
%
(David Sampugnaro and Kristian Zoerhoff).
(Watson's record: 1-0-2)
Games played in Hawthorne and Yorktown Heights.
Watson defeats Ken Jennings and Brad
Rutter, two of the greatest Jeopardy!
players, in a televised exhibition match.
January 2010
First sparring games held in newly
constructed sparring game studio at
T.J. Watson Research Center, Yorktown
Heights, NY. Also, first time games
played using game control system.
September 2009
First development game: This is the first
time the live system plays against humans.
Previously, Watson timing was simulated.
This reflects the success of scale-out
to achieve 3 second response times.
Note: All development games were against IBM
volunteer players from within IBM Research.
April 2009
March 2009
Baseline
%
%
v0.610/09
70
%
v0.5 05/09
60
10
20
v0.4 12/08
50
30
%
%
%
40
%
v0.3 08/08
%
60
%
70
80
%
%
90
100
%
%
%
%
v0.1 12/07
%
May 2009
20
%
Baseline
%
10
%
20
30
%
%
40
50
%
%
60
%
70
%
80
%
90
%
% Answered
August 2009
August 2009
Scale-out team begins the process
of incorporating new algorithms
and expanding Watson's knowledge
corpus while maintaining 3 second
response time.
April 2009
SPEED: DeepQA
SPEED:
averages 4.2
DeepQA
averages
seconds/clue.
8 seconds/clue.
100
%
IBM and Jeopardy! establish the
intent to have the machine play
live on the show.
December 2008
300
250
200
Frequency
10
SPEED: DeepQA averages
target speed of 3 seconds/clue.
150
August 2008
50
0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40
March 2009
June 2009
100
90
%
%
80
%
SPEED: DeepQA
averages 3.6
seconds/clue.
70
Precision
%
v0.5 05/09
60
%
v0.4 12/08
50
v0.3 08/08
%
v0.2 05/08
40
%
v0.1 12/07
30
%
20
%
10
Baseline
%
10
%
20
%
30
%
40
%
50
%
60
%
70
80
%
%
90
%
Seconds
DeepQA begins using live
search. Scale-out team
begins work to achieve goal
of 3 second response time.
DeepQA v0.5
March 2007
Internal commitment to
pursue the Jeopardy!
challenge.
100
%
% Answered
December 2008
October 2008
%
%
80
%
TIMELINE
70
Precision
%
100
90
60
%
v0.4 12/08
50
v0.3 08/08
%
v0.2 05/08
40
%
v0.1 12/07
Spring 2008
30
%
%
20
%
10
80
%
%
First Jeopardy! game simulator
built by Ferrucci, allowing humans
to compete against the system.
Baseline
10
%
20
30
%
%
40
%
70
50
60
%
70
%
%
80
%
90
100
%
%
% Answered
%
August 2008
60
%
50
%
DeepQA v0.3
40
December 2007
DeepQA v0.1 achieves meaningful
performance leap above baseline.
100
90
%
%
%
80
%
100
90
%
30
70
March 2007
Feasibility study results and baselines established.
IBM projected that in 3 years the team could build
an impressive system capable of competing
favorably with good Jeopardy! players but unlikely
good enough to win consistently against the best
players. That might take closer to 5 years. Unbeatable
was unlikely in three to five year time frame.
Research
%
%
20
May 2008
80
%
%
10
70
%
DeepQA v0.2 makes first big leap in performance.
30
%
40
50
%
%
60
%
70
%
80
%
90
%
100
% Answered
%
60
100
90
%
%
%
50
v0.3 08/08
%
80
%
v0.2 05/08
40
70
%
%
Precision
%
Precision
%
20
v0.1 12/07
30
%
October 2008
Ferrucci creates a "Confidence Thermometer" to provide
insight at the Board of Directors demo. This confidence
bar eventually leads to Watson's answer panel.
February 2008
20
%
%
50
%
v0.2 05/08
40
%
v0.1 12/07
First ever "weekly run" is conducted.
This evolves into an extensive run
procedure where each component is
tagged for the weekly run and the entire
version of the system is saved as a
PSF file so that the exact version can
be retrieved from SVN at any time.
60
%
50
%
40
%
v0.1 12/07
30
%
20
%
10
%
Baseline
10
20
%
%
30
%
40
%
50
%
60
%
% Answered
70
%
80
%
90
%
100
%
PERFORMANCE: PIQUANT system was tweaked to
answer Jeopardy! questions and establish baseline.
November 2007
30
%
10
60
Precision
%
%
(still had no idea if team would ever make it smart
enough or fast enough to play top human players)
%
20
%
Baseline
10
%
10
%
October 2008
Scale-out team decides to build a separate
"production" system environment. More
hardware gives the team better ability to run
and manage experiments on many machines
at once with DeepQA Architecture.
20
%
30
%
40
%
50
%
60
%
% Answered
70
%
80
%
90
%
100
Baseline
10
%
20
%
30
%
40
%
50
%
60
%
70
%
80
%
90
%
Early 2008
100
%
%
Summer 2008
DeepQA team discovers a "climate change,”
noticing a 10% drop in system performance
in clues written for season 20 (2003/2004)
and after. Discussions with Jeopardy!
producers confirmed that in that season
Jeopardy! writers began to write more
creatively (more slang, more plays on words).
% Answered
DeepQA architecture rolls out
allowing more independence to create
and integrate algorithms. Eventually
leads to tenfold increase in feature set.
March 2008
DeepQA v0.2 functional:
60 features now included (10x v0.1).
DeepQA v0.1 functional: This system only
had 6 features. Statistical combination
and machine learning techniques at this
point were primitive and poorly designed.
October 2007
Error Analysis Tool: The first
version of the tool allowing the
team to capture and store the
results of experiments. This aids
the team in understanding what
worked and what did not.
November 2006
First contact with the Jeopardy!
team in Los Angeles.
Idea conceived by Charles Lickel
while observing Ken Jennings’
historic Jeopardy! run.
Team
February 2008
September 2007
Team moves into first
small lab space to
facilitate communication
and collaboration.
The decision is made to do a
feasibility study led by David Ferrucci.
September 2007
came up with possible names for the
Jeopardy! machine. These included: THINQ,
THINQER, Exaqt, Ace, Deep Logic, System/QA,
Qwiz, nSight, Mined, and EureQA. “Watson”
was chosen on December 12, 2008 to honor
IBM founder, Thomas J. Watson, who originally
established the IBM Research division on the
campus of Columbia University in 1945.
January 2007
November 2004
100
90
%
10
Demo to JPI: JPI visited Yorktown
and the game simulator was used to
demo Watson. The demo was still not
good enough to play champs and still
taking 2 hrs to answer a question.
The demo was with Watson's real
answers and confidence but with
simulated response times.
DeepQA v0.4
Winner’s Cloud: The model for target performance moves
from winner's arc to "winner’s cloud." The winner's cloud is
based on actual human performance whereas the winner's
arc was purely theoretical. The winner's cloud is considered
a more accurate model of winning performance.
Watson
named:
Working with its branding partners, IBM
100
0
May 2009
Joint announcement with JPI of
intent to compete on Jeopardy!
with "Watson.” Public web site
for Watson and DeepQA unveiled,
including first Watson video clip.
March 2009
SPEED: DeepQA achieves its first
fast response times, with an average
of 12 seconds/clue. Previously, it
took well over an hour.
% Answered
v0.2 05/08
40
50
%
30
Precision
players with two wins or fewer on the show
concludes. (Watson’s record: 47-15-11, 64% wins)
%
%
Precision
v0.5 05/09
First official sparring game (Round 1)
against former Jeopardy! players
with two wins or fewer on show
in Hawthorne, NY.
Sparring Round 1 against
former Jeopardy!
Watson competes in first Tournament
of Champions, Finals style competition
against, among others, the two greatest
IBMer former Jeopardy! players
February 2011
DeepQA v0.7
March 2010
April 2010
Tournament of Champions semifinalists
and finalists concludes. (Watson’s record: 39-8-8, 71% wins)
Games played in Yorktown Heights.
April 2010
70
Jeopardy!
Sparring Round 2 against
former Jeopardy!
September 2010
August 2010
November 2009
Multi-processing: The first version of the system
that allowed the team to run an experiment on
multiple machines at once. This dramatically
reduced experimental cycle time and allowed
the team to run really big experiments.
March 2008
The Watson team moves
to larger lab space on the
2nd floor of IBM Hawthorne.
Productivity increases
significantly.
Summer 2008
OAQA OCR is formed
and first 4 collaborators join
(USC, CMU, UT Austin, U. Mass).
Ferrucci and Eric Nyberg (CMU)
kick off the OAQA workshop
hosted by IBM in Yorktown.
Attendees:
UT - Bruce Porter, Ken Barker
USC - Ed Hovy
U. Mass - James Allan
MIT - Boris Katz
Stanford - Chris Manning
CMU - Eric Nyberg
October 2008
Scale-out commitment made,
engaging Eddie's team.
April 2009
David Ferrucci renames the
project/technology "DeepQA"
instead of "BlueJ!."