View a timeline of Watson`s milestones
Transcription
View a timeline of Watson`s milestones
November 2010 February 2010 First system featuring an expanded knowledge corpus. The knowledge corpus is the external information Watson is exploring to find answers to questions. The unstructured text corpus moved from 24 GB to 58 GB. SPEED: DeepQA averages less than 3 seconds/clue, the final speed for the competition. First live test of the Watson Avatar. (8 million pages of text to 18 million; or 37,000 books to 90,000 books) November 2010 100 90 Round 2 sparring establishes that Watson can more than compete with champion-level Jeopardy! players. (Watson’s record: 39-8-8, 71% wins) % % v0.7 04/10 80 % v0.610/09 Precision % October 2009 DeepQA v0.6 60 % v0.4 12/08 50 v0.3 08/08 % v0.2 05/08 40 % v0.1 12/07 30 % 100 90 20 80 10 % (David Sampugnaro and Kristian Zoerhoff). (Watson's record: 1-0-2) Games played in Hawthorne and Yorktown Heights. Watson defeats Ken Jennings and Brad Rutter, two of the greatest Jeopardy! players, in a televised exhibition match. January 2010 First sparring games held in newly constructed sparring game studio at T.J. Watson Research Center, Yorktown Heights, NY. Also, first time games played using game control system. September 2009 First development game: This is the first time the live system plays against humans. Previously, Watson timing was simulated. This reflects the success of scale-out to achieve 3 second response times. Note: All development games were against IBM volunteer players from within IBM Research. April 2009 March 2009 Baseline % % v0.610/09 70 % v0.5 05/09 60 10 20 v0.4 12/08 50 30 % % % 40 % v0.3 08/08 % 60 % 70 80 % % 90 100 % % % % v0.1 12/07 % May 2009 20 % Baseline % 10 % 20 30 % % 40 50 % % 60 % 70 % 80 % 90 % % Answered August 2009 August 2009 Scale-out team begins the process of incorporating new algorithms and expanding Watson's knowledge corpus while maintaining 3 second response time. April 2009 SPEED: DeepQA SPEED: averages 4.2 DeepQA averages seconds/clue. 8 seconds/clue. 100 % IBM and Jeopardy! establish the intent to have the machine play live on the show. December 2008 300 250 200 Frequency 10 SPEED: DeepQA averages target speed of 3 seconds/clue. 150 August 2008 50 0 2 4 6 8 10 12 14 16 18 20 22 24 26 28 30 32 34 36 38 40 March 2009 June 2009 100 90 % % 80 % SPEED: DeepQA averages 3.6 seconds/clue. 70 Precision % v0.5 05/09 60 % v0.4 12/08 50 v0.3 08/08 % v0.2 05/08 40 % v0.1 12/07 30 % 20 % 10 Baseline % 10 % 20 % 30 % 40 % 50 % 60 % 70 80 % % 90 % Seconds DeepQA begins using live search. Scale-out team begins work to achieve goal of 3 second response time. DeepQA v0.5 March 2007 Internal commitment to pursue the Jeopardy! challenge. 100 % % Answered December 2008 October 2008 % % 80 % TIMELINE 70 Precision % 100 90 60 % v0.4 12/08 50 v0.3 08/08 % v0.2 05/08 40 % v0.1 12/07 Spring 2008 30 % % 20 % 10 80 % % First Jeopardy! game simulator built by Ferrucci, allowing humans to compete against the system. Baseline 10 % 20 30 % % 40 % 70 50 60 % 70 % % 80 % 90 100 % % % Answered % August 2008 60 % 50 % DeepQA v0.3 40 December 2007 DeepQA v0.1 achieves meaningful performance leap above baseline. 100 90 % % % 80 % 100 90 % 30 70 March 2007 Feasibility study results and baselines established. IBM projected that in 3 years the team could build an impressive system capable of competing favorably with good Jeopardy! players but unlikely good enough to win consistently against the best players. That might take closer to 5 years. Unbeatable was unlikely in three to five year time frame. Research % % 20 May 2008 80 % % 10 70 % DeepQA v0.2 makes first big leap in performance. 30 % 40 50 % % 60 % 70 % 80 % 90 % 100 % Answered % 60 100 90 % % % 50 v0.3 08/08 % 80 % v0.2 05/08 40 70 % % Precision % Precision % 20 v0.1 12/07 30 % October 2008 Ferrucci creates a "Confidence Thermometer" to provide insight at the Board of Directors demo. This confidence bar eventually leads to Watson's answer panel. February 2008 20 % % 50 % v0.2 05/08 40 % v0.1 12/07 First ever "weekly run" is conducted. This evolves into an extensive run procedure where each component is tagged for the weekly run and the entire version of the system is saved as a PSF file so that the exact version can be retrieved from SVN at any time. 60 % 50 % 40 % v0.1 12/07 30 % 20 % 10 % Baseline 10 20 % % 30 % 40 % 50 % 60 % % Answered 70 % 80 % 90 % 100 % PERFORMANCE: PIQUANT system was tweaked to answer Jeopardy! questions and establish baseline. November 2007 30 % 10 60 Precision % % (still had no idea if team would ever make it smart enough or fast enough to play top human players) % 20 % Baseline 10 % 10 % October 2008 Scale-out team decides to build a separate "production" system environment. More hardware gives the team better ability to run and manage experiments on many machines at once with DeepQA Architecture. 20 % 30 % 40 % 50 % 60 % % Answered 70 % 80 % 90 % 100 Baseline 10 % 20 % 30 % 40 % 50 % 60 % 70 % 80 % 90 % Early 2008 100 % % Summer 2008 DeepQA team discovers a "climate change,” noticing a 10% drop in system performance in clues written for season 20 (2003/2004) and after. Discussions with Jeopardy! producers confirmed that in that season Jeopardy! writers began to write more creatively (more slang, more plays on words). % Answered DeepQA architecture rolls out allowing more independence to create and integrate algorithms. Eventually leads to tenfold increase in feature set. March 2008 DeepQA v0.2 functional: 60 features now included (10x v0.1). DeepQA v0.1 functional: This system only had 6 features. Statistical combination and machine learning techniques at this point were primitive and poorly designed. October 2007 Error Analysis Tool: The first version of the tool allowing the team to capture and store the results of experiments. This aids the team in understanding what worked and what did not. November 2006 First contact with the Jeopardy! team in Los Angeles. Idea conceived by Charles Lickel while observing Ken Jennings’ historic Jeopardy! run. Team February 2008 September 2007 Team moves into first small lab space to facilitate communication and collaboration. The decision is made to do a feasibility study led by David Ferrucci. September 2007 came up with possible names for the Jeopardy! machine. These included: THINQ, THINQER, Exaqt, Ace, Deep Logic, System/QA, Qwiz, nSight, Mined, and EureQA. “Watson” was chosen on December 12, 2008 to honor IBM founder, Thomas J. Watson, who originally established the IBM Research division on the campus of Columbia University in 1945. January 2007 November 2004 100 90 % 10 Demo to JPI: JPI visited Yorktown and the game simulator was used to demo Watson. The demo was still not good enough to play champs and still taking 2 hrs to answer a question. The demo was with Watson's real answers and confidence but with simulated response times. DeepQA v0.4 Winner’s Cloud: The model for target performance moves from winner's arc to "winner’s cloud." The winner's cloud is based on actual human performance whereas the winner's arc was purely theoretical. The winner's cloud is considered a more accurate model of winning performance. Watson named: Working with its branding partners, IBM 100 0 May 2009 Joint announcement with JPI of intent to compete on Jeopardy! with "Watson.” Public web site for Watson and DeepQA unveiled, including first Watson video clip. March 2009 SPEED: DeepQA achieves its first fast response times, with an average of 12 seconds/clue. Previously, it took well over an hour. % Answered v0.2 05/08 40 50 % 30 Precision players with two wins or fewer on the show concludes. (Watson’s record: 47-15-11, 64% wins) % % Precision v0.5 05/09 First official sparring game (Round 1) against former Jeopardy! players with two wins or fewer on show in Hawthorne, NY. Sparring Round 1 against former Jeopardy! Watson competes in first Tournament of Champions, Finals style competition against, among others, the two greatest IBMer former Jeopardy! players February 2011 DeepQA v0.7 March 2010 April 2010 Tournament of Champions semifinalists and finalists concludes. (Watson’s record: 39-8-8, 71% wins) Games played in Yorktown Heights. April 2010 70 Jeopardy! Sparring Round 2 against former Jeopardy! September 2010 August 2010 November 2009 Multi-processing: The first version of the system that allowed the team to run an experiment on multiple machines at once. This dramatically reduced experimental cycle time and allowed the team to run really big experiments. March 2008 The Watson team moves to larger lab space on the 2nd floor of IBM Hawthorne. Productivity increases significantly. Summer 2008 OAQA OCR is formed and first 4 collaborators join (USC, CMU, UT Austin, U. Mass). Ferrucci and Eric Nyberg (CMU) kick off the OAQA workshop hosted by IBM in Yorktown. Attendees: UT - Bruce Porter, Ken Barker USC - Ed Hovy U. Mass - James Allan MIT - Boris Katz Stanford - Chris Manning CMU - Eric Nyberg October 2008 Scale-out commitment made, engaging Eddie's team. April 2009 David Ferrucci renames the project/technology "DeepQA" instead of "BlueJ!."