Robot Analysts and Robot ReportersBy
The amount of data being generated today is overwhelming. This is a problem because, on its own, raw data tends to be complicated, confusing, and boring. Usually, it has to be converted into readable narratives or reports before it will gain an audience. This dynamic produces a ground-level problem of endless clouds of data and not enough report writers translating it all into consumable information. Two obvious solutions are to slow down the data collection or to hire armies of data analysts. Actually, though, there’s a third way: Teach computers to write the reports. Computers are, after all, as ubiquitous as data clouds.
For many, the headlines seem preposterous. In March, Poynter.org reported, “AP [Associated Press] will use software to write NCAA game stories.” In mid-July, The Wall Street Journal published an article titled “Can You Tell the Difference Between a Robot and a Stock Analyst? Wall Street tries out research reports written by artificial intelligence.”
It all sounds very improbable, but when you think about it, the reporter’s who, what, when, and where often have answers that are just names and numbers. The reporter’s why or how could be a little more difficult for computers, but not the other questions. Consider the average coverage for sporting events in a newspaper. It’s often little more than who scored, when they scored, and what the final stats were.
AP’s expansion of automatically generated sports stories began this spring with coverage of Division I collegiate baseball. Barry Bedlan, AP’s deputy director of sports products, said, “This will mean thousands of more stories on the AP wire, which will remain unmatched in the industry. Every college sports town will have some level of coverage.”
The software used by the AP is from Automated Insights, and the news-gathering organization has been using it to produce earnings reports as well. A press release from the software company explains:
“Automated Insights, (Ai), the world leader in producing personalized narrative content from Big Data, announced today that its Wordsmith platform is automatically producing 3,000 stories per quarter for The Associated Press—a tenfold increase over what AP reporters and editors created previously. In addition, the stories contain far fewer errors than their manual counterparts.”
Ross Miller of The Verge explains that one of those auto-generated stories covered Apple’s record-breaking quarterly earnings reported the last week of January. The AP coverage was published on CNBC, Yahoo!, and more only minutes after the release. At the end of the article, the software signed: “This story was generated by Automated Insights.”
Automated Insights’s public relations manager, James Kotecki, told The Verge that its Natural Language Generation (NLG) platform, Wordsmith, “generates millions of articles per week” for numerous companies and outlets, and “the company’s system can produce 2,000 articles per second if need be.”
AP’s output has increased tenfold with Wordsmith. Before it was doing earnings reports for 300 companies each quarter, and now it automates 3,000 reports in the same period. And the increase has freed up the human reporters to do more insightful pieces about the companies.
HOW DOES IT WORK?
Automated Insights explains how Wordsmith produces narratives from financial data by using a four-step process.
- Retrieve data. Wordsmith gathers data from application program interfaces (APIs), database connections, spreadsheets, public repositories, and third-party data providers.
- Analyze data and identify insights. At this point, Wordsmith begins to react like a human data analyst, using its machine learning and artificial intelligence to look for trends, patterns, and correlations. Its tool-set includes a full range of descriptive statistics, significance testing, regression analysis, time-series models, predictive analytics, and more.
- Write the narrative. Using its configured style and lexicon, Wordsmith writes anything from long-form narratives to simple tweets in a tone appropriate for its designated readers.
- Publish the narrative. Wordsmith can publish in real time in virtually any format and for any screen. It even creates its own visualizations.
Wordsmith can even translate the raw data collected from your fitness apps on your smartwatch or wristband into short, readable reports on your progress.
As for the future of NLG, if what’s past is prologue, it looks like the sky’s the limit. It was estimated that Wordsmith produced more than one billion personalized reports for Yahoo!, The Associated Press, the NFL, and Edmunds.com in 2014.
Can you tell the difference between human and computer writing? Try The New York Times quick quiz here: http://nyti.ms/1PbuHe2.