From Contextual Search to Automatic Content Generation
Transcription
From Contextual Search to Automatic Content Generation
From Contextual Search to Automatic Content Generation: Scaling Human Editorial Judgment Larry Birnbaum Northwestern University Knight Lab and Narrative Science Inc. The challenge Watson: Context-driven search AI + IR Our point: Context can drive search Our starting context: Text Our techniques: Statistical and heuristic Our substrate: Search Our technology: Automatic query formation, management, result filtering, ranking Our goal: Support content creation Fast, automatic, better than human performance Similarity vs. relevance The most similar possible document to a document that you have in your hands is… another copy of that document What makes a document relevant is that it is similar in certain respects and dissimilar in certain other respects Watson relied on noise and the size of the internet to achieve the variation necessary to provide truly relevant documents Can we do better? Another example Relevance revisited Determining what information people see—and in what form and order—is an editorial judgment This editorial judgment should be explicit and visible to both designers/engineers and users This means a deliberate mechanism Which can—only?—be realized by a semantic mechanism Specific dimensions of similarity/dissimilarity provide useful information to the user based on his/her context of activity Beyond Broadcast What is editorial judgment? What to look for: Context analysis and query formation Where to look for it: Source selection How to assess the results: Result filtering, tagging, and ranking How to show the results: Presentation Compare & Contrast Two comparable stories Oracle tried to buy open-source MySQL SAN FRANCISCO--Oracle tried to acquire opensource database maker MySQL, an indication of the profound changes the software giant is willing to make as it adapts to the increasingly significant collaborative programming philosophy. MySQL Chief Executive Marten Mickos confirmed the acquisition attempt in an interview at the Open Source Business Conference here but wouldn't provide details such as when the approach was made or how much money Oracle offered. … Oracle didn't immediately comment on the acquisition offer. Though it is increasingly diversified, Oracle's primary business is selling its own proprietary database software. MySQL, in contrast, is a leader among several companies trying to commercialize rival open-source products. … from CNET News.com IBM Expands Paid Open Source Strategy UPDATED: IBM is making a bid on professional open source with the acquisition of privately held Gluecode, officials announced Tuesday. Officials did not discuss financial and operational details of the merger, the first acquisition made by Big Blue of an open source company. Gluecode's operations will be assimilated into IBM's software group and expand the company's WebSphere application integration middleware product line. Officials plan to offer customers and business partners Gluecode's application server software and sell software and support services on top of the offering, as well as let customers upgrade to IBM WebSphere products. … from internetnews.com Another example Compare & Contrast: The numbers Precision: ~70% “Recall”: ~60% LocalSavvy LocalSavvy cont’d Blog search: Spectrum query keywords “epistemic” dimension in which the user is interested Spectrum results From technologist and IT entrepreneurs Focus on the impact on the IT industry Spectrum results cont’d From lawyers and law school professors Focus on the legal issues of the case Spectrum: The numbers Tell Me More Tell Me More: Actors Tell Me More: Data Tell Me More: Quotes Tell Me More: Twitter And you get back… With contextualized search the text box is (usually) gone… But the presentation model for the output is the same—a list of results! Users don’t want a million documents, or a hundred, or even ten—they want one document, that’s written just for them, right now They want synthetic documents News at Seven Narratives from data Narratives from data: Derived features Narratives from data: The outline Narratives from data: The story Big Ten Softball & Baseball Narrative Science: Human Insight at Machine Scale Human Insight at Machine Scale • Technology initially developed at Northwestern University • Company launched in 2010 • A partnership between AI and editorial • Started in media, now 70% of our customers are in other industries 34 February 9, 2012, Hockey Recap: Rio Grande Valley rolls over Laredo, 6-3 The Rio Grande Valley Killer Bees were firing on all cylinders against the Laredo Bucks, and when the final buzzer sounded Killer Bees emerged with a 6-3 win. Zac Pearson was all over the ice for Rio Grande Valley, as he tallied two goals and one assist in the win. Pearson scored the first of his two goals at 5:23 into the first period to make the score 1-0 Rio Grande Valley. Brandon Campos picked up the assist. Pearson's next tally made the score 2-0 Rio Grande Valley with 12:44 left in the first period. David Marshall assisted on the tally. The Rio Grande Valley Killer Bees were The punch line... firing on all cylinders against the The Killer Bees' goal total was higher than their season average. Rio Grande Valley averages two goals per game. The Killer Bees could not stay and out of the when penalty box, asthe the teamfinal accrued 17 minutes in Laredo Bucks, penalties during the game. The leading offender was Jason Beeman, who totaled five minutes in penalty time with one major. With 48 shots on target during the contest, Rio Grande Valley exceeded the 22 buzzer sounded Killer Bees emerged shots it averages per game this year. Rio Grande Valley additionally got points from Aaron Lee, who had one goal and one assist, Marshall, with a 6-3 win. who registered one goal and two assists, and Dan Gendur, who racked up one goal and one assist. Dan Special teams units factored heavily in the game's outcome, as there were 14 penalties called on the two teams. The busiest period in the sin bins was the first period, which saw 18 minutes of penalty time combined between the two teams. Nicholls also scored for Rio Grande Valley. Others to record assists for Rio Grande Valley were AJ Mikkelsen, who had two and Adam Bartholomay and Marc-Andre Carre, who each chipped in one. Laredo was often in penalty trouble, as it ended with six minors and one major for 17 minutes in penalty time. The leading offender was Justin Styffe, who totaled nine minutes in penalty time with two minors and one major. More than Sports: Business… Medicine… Medicine… This 48-year-old White female subject in Canada with a medical history of diarrhea, ovarian cyst, asthma, depression, hypertension, GI upset, cholecystectomy and hysterectomy received CP-690,550 for the treatment of active rheumatoid arthritis. The subject was treated with CP-690,550 orally, 5 mg twice daily, at a total daily dose of 10 mg, from 07 Apr 2010 (Study Day 1) to 09 Sep 2010, for a total of 156 days. …and Unstructured Data Too NEWT GINGRICH GAINS ATTENTION WITH HOT-BUTTON TOPICS TAXES, CHARACTER ISSUES Newt Gingrich received the largest increase in Tweets about him today. Twitter activity associated with the candidate has shot up since yesterday, with most users tweeting about taxes and character issues. Newt Gingrich has been consistently popular on Twitter, as he has been the top riser on the site for the last four days. Conversely, the number of tweets about Ron Paul has dropped in the past 24 hours. Another traffic loser was Rick Santorum, who has also seen tweets about him fall off a bit. While the overall tone of the Gingrich tweets is positive, public opinion regarding the candidate and character issues is trending negatively. In particular, @MommaVickers says, "Someone needs to put The Blood Arm's 'Suspicious Character' to a photo montage of Newt Gingrich. #pimp”. On the other hand, tweeters with a long reach are on the upside with regard to Newt Gingrich's take on taxes. Tweeting about this issue, @elvisroy000 says, "Newt Gingrich Cut Taxes Balanced Budget, 1n 80s and 90s, Newt experienced Conservative with values”. Maine recently held its primary, but it isn't talking about Gingrich. Instead the focus is on Ron Paul and religious issues. Why Stories? Over the past eight quarters, AmgenCorp has seen a steady increase in both sales and gross profit. During this period, however, two trends have become apparent. While sales have increased, there has been a steady decrease in margins which should be cause for concern. In addition, while we have seen quarter-to-quarter increases in both sales and gross profit, each of these metrics is decelerating, indicating that the business is experiencing a slow down. Stories as Human-Centered Data Analytics • Stories embody high-level patterns or themes that are efficiently communicated and effectively grasped—e.g., comeback, fade, low-hanging fruit, etc. • Stories pick out, connect, and summarize the critical aspects of a situation • Stories explain: They convey trends, causes, and even recommendations • Stories make data meaningful. How Does It Work? Narrative Science: Big Data… Made Personal Investment Research DATE TICKER S S S S S S S S S S S S S S S S S S S S S T T T T T T T T T T T T T T T TICKER SHORT TERM DEBT LONG TERM DEBT 12/31/2011 S 8 9/30/2011 S 2257 CURRENT ACCOUNTS TOTAL COST OF SALES CASH 2256 6/30/2011ASSETS SPAYABLE ASSETS 12/31/11 0:00 10337 1887 49383 5597 5419 3/31/2011 S 9/30/11 0:00 8650 2188 48015 400122564611 6/30/11 0:00 9182 3035 49043 4271 4589 12/31/2010 S 3/31/11 0:00 8447 2738 49320 399816564396 12/31/10 0:00 9880 2131 51654 5473 9/30/2010 S 17584568 9/30/10 0:00 9255 2423 52274 4666 4496 6/30/10 0:00 8955 2494 53226 427717584230 6/30/2010 S 3/31/10 0:00 8835 2282 54282 4374 4198 3/31/2010 S 12/31/09 0:00 8593 1575 55424 392424154079 9/30/09 0:00 10259 2161 55648 5943 4269 12/31/2009 S 6/30/09 0:00 9098 2061 55885 4609 7684061 3/31/09 0:00 8903 2105 57225 4536 9/30/2009 S 7654026 12/31/08 0:00 8344 1574 58252 3719 4122 9/30/08 0:00 9266 2582 61861 419113734224 6/30/2009 S 6/30/08 0:00 8941 2481 62805 3501 4177 3/31/2009 S 3/31/08 0:00 10327 3264 65463 477512254223 12/31/07 0:00 8661 2750 64109 2440 4237 12/31/2008 S 9/30/07 0:00 8920 3320 93926 2243 6184222 6/30/07 0:00 9278 3309 94895 2424 9/30/2008 S 16174342 3/31/07 0:00 8947 3307 95089 2354 4390 12/31/06 0:00 10304 2366 97161 2061 6264320 6/30/2008 S 12/31/11 0:00 23027 8593 270344 3185 11194 3/31/2008 S 9/30/11 0:00 28605 17860 277653 107621330 13165 6/30/11 0:00 22239 18145 272014 3831 13332 12/31/2007 S 3/31/11 0:00 19737 18349 268085 13911661 13403 12/31/10 0:00 19951 7437 268488 1437 429 13939 9/30/2007 S 9/30/10 0:00 21977 18417 269252 3246 13605 6/30/10 0:00 21392 18157 267556 13771185 12452 6/30/2007 S 3/31/10 0:00 22362 18087 265701 2617 12383 3/31/2007 S 12/31/09 0:00 24334 7514 268752 3802 419 12974 9/30/09 0:00 25921 18093 266568 6167 12839 12/31/2006 S 114312557 6/30/09 0:00 26940 18046 267918 7348 3/31/09 0:00 23521 17359 264358 3812 12201 12/31/08 0:00 22556 6921 265245 1792 12642 9/30/08 0:00 23445 18690 284528 1594 13022 6/30/08 0:00 23225 18927 284508 1631 11897 DATE DEBT IN CURRENT LIABILITIES DEPRECIATION EXPENSE SALES CAPEX 20266 1174 8722 909 16272 1194 8333 818 DEBT IN LONG DEPRECIATION AND TERM LIABILITIES AMORTIZATION 1235NET INCOME INVENTORIES 16278 8311 8 20266 1174 -1303 913 16282 16272 1255 2257 1194 -301 8313 923 2256 16278 1235 -847 1160 18535 16282 1386 2256 1255 -439 8301 728 1656 18535 1386 -929 670 18540 1552 8152 1758 18540 1552 -911 562 1758 1635 -760 8025 675 18543 18543 1635 2415 18639 1675 -865 526 18639 20293 1675 768 1802 -980 8085 628 765 20892 1820 -478 485 20293 19618 1802 1373 1911 -384 7868 575 1225 1870 -594 20892 20376 1820 8042 510 618 20992 1992 -1621 528 1617 2057 -326 8141 682 19618 21023 1911 626 22358 2157 -344 625 20376 22987 1870 1330 2190 -505 8209 842 1661 20469 2220 -29452 938 20992 21723 1992 429 2222 64 8430 876 1185 2313 19 21023 21713 2057 8816 1018 419 21752 2268 -211 884 1143 2404 261 9055 1176 22358 21011 2157 3453 61300 4573 -6678 1188 22987 62326 2190 8900 4618 3623 93341216.75 7910 58663 4602 3591 1245.5 20469 58126 2220 6902 4584 3408 98471274.25 7196 4907 1087 10044 1303 21723 58971 2222 6426 62540 4873 11539 1198.5 9721 4819 4008 10163 1094 21713 60277 2313 9437 60024 4780 2451 989.5 21752 64720 2268 7361 4966 2716 10092 885 6755 65909 4816 3077 879.25 21011 66565 2404 10438 873.5 10155 4875 3195 10790 63560 4858 3130 867.75 14119 60872 5044 2404 862 17419 59355 4978 3230 926.25 16472 63675 4958 3772 990.5 While the company appears to be investing in fixed assets, the company's total debt remained level at $20.27 billion to signal the company isn't taking on significantly more debt, but free cash flow fell 80.5% from a year earlier, signaling the company may not be generating sufficient cash flow to cover future capital expenditures. 759 644 523 490 417 505 484 402 340 377 610 703 899 1670 1671 1261 1577 1813 2411 Investment Research While the company appears to be investing in fixed assets, the company's total debt remained level at $20.27 billion to signal the company isn't taking on significantly more debt, but free cash flow fell 80.5% from a year earlier, signaling the company may not be generating sufficient cash flow to cover future capital expenditures. Sector Reporting MRK (Merck & Co Inc.) erased early losses and rose 0.6% to $31.26. The company recently announced its chairman is stepping down.October MRK6,stock traded in the rangeMIDDAY of $31.21 - $31.56. MRK's Thursday, 2011 12:00 PM: HEALTHCARE COMMENTARY: volume was 86.1% lower than usual with 2.5 million shares trading The Healthcare (XLV) sector underperformed the market in early trading on Thursday. Healthcare hands. gains 11.1% lower thanandits stocks trailed Today's the market by 0.4%. still So far,leave the Dowthe rose stock 0.2%, theabout NASDAQ saw growth of 0.8%, theprice S&P500three was upmonths 0.4%. ago. Here are a few Healthcare stocks that bucked the sector's downward trend. The Healthcare (XLV) sector underperformed the market MRK (Merck & Co Inc.) erased early losses and rose 0.6% to $31.26. The company recently in early trading ondown. Thursday. Healthcare trailed announced its chairman is stepping MRK stock traded in the range ofstocks $31.21 - $31.56. MRK's volume was 86.1% lower than usual with 2.5 million shares trading hands. Today's gains still leave the market by 0.4%. So far, the Dow rose 0.2%, the the stock about 11.1% lower than its price three months ago. NASDAQ saw growth of 0.8%, and the S&P500 was up LUX (Luxottica Group) struggled in early trading but showed resilience later in the day. Shares rose 3.8% to $26.92. LUX traded in the range of $26.48 - $26.99. Luxottica Group’s early share volume 0.4%. was 34,155. Today's gains still leave the stock 21.8% below its 52-week high of $34.43. The stock remains about 16.3% lower than its price three months ago. Shares of UHS (Universal Health Services Inc.) are trading at $32.89, up 81 cents (2.5%) from the previous close of $32.08. UHS traded in the range of $32.06 - $33.01… Business Reporting Education Client Portfolio Data 48 Quill™ for Google Analytics 49 Narrative Science: Human Insight at Machine Scale In In 2011: 2012: 370,000 2M+ Stories Stories Impact: Knight Lab A joint laboratory of the McCormick School of Engineering and the Medill School of Journalism at Northwestern Mission: To significantly increase the pace of technology innovation in media and journalism Focus on technologies that leverage human editorial judgment at all stages of the content pipeline— intelligent information gathering, content creation, distribution, news experiences and user interaction— at scale Supported by a 4-year, $4.2 million grant from the John S. and James L. Knight Foundation The Local Angle Twitter Profiling Twitter Profiling cont’d Twitter Profiling cont’d Twitter Profiling cont’d Book recommendations from Twitter Thanks Research partner: Kris Hammond Students: Nick Allen, Jay Budzik, Andy Crossen, Lisa Gandy, Francisco Iacobelli, Jiahui Liu, Patrick McNally, Nate Nichols, Shawn O’Banion, John Templon, Earl Wagner Developers: Scott Bradley, Jenny Wilson Medill colleagues: Rich Gordon, Owen Youngman, Jeremy Gilbert, Miranda Mulligan, Joe Germuska Support: Knight Foundation, NSF, McCormick Foundation And of course the folks at Narrative Science!