Nearly half of companies recently surveyed said that automating content creation would save their content marketing teams the most time. We’ve already covered Natural Language Generation (NLG) algorithms and how they have made some forms of automated content generation a reality already, such as for sports recaps or financial data reporting. Let’s take a deeper look at how NLG can help your agency rapidly deploy new content and provide a more personalized content experience for users.
Can it Help Your Agency?
The two main benefits of NLG algorithms are how quickly they can create new content (especially for an urgent item, like an earthquake) and the coverage of more topics without additional labor that they allow. They can also allow humans to spend their time covering more nuanced topics while letting NLG bots handle narratives based upon data analysis.
Push Button Data Narratives
One of the first areas within the federal sphere where NLG algorithms could be leveraged would be to develop narratives based around analytics.usa.gov trends. The quality of data generated by that site seems like a perfect place to begin experimenting. In addition to the wonderful raw data displayed on the site, how about an NLG-written blog providing a narrative about the trends over the past week? I had an algorithm automatically analyze the usage data of the main site I support (oscar.uscourts.gov) to produce a quick report for the month of June. I could see something similar being done for analytics.usa.gov. Below is an example, from my report:
Direct traffic dropped off to 6,958 sessions from 7,687 sessions, accounting for 41% of your site’s traffic overall. A week earlier, direct traffic made up 43% of all sessions to your site. Organic search was responsible for 52% of your site’s traffic last week with 8,771 sessions, which was in line with the 8,820 sessions from one week before.
Also based on the amount of traffic that analytics.usa.gov reports regarding popular federal weather sites, perhaps data from the National Weather Service and other branches of NOAA could also be included in the NLGs sources to allow for more interpretation of traffic. For example, correlating hurricane or tropical storm activity in the Gulf with a spike in traffic to various NWS sites?
Using NLG for the creation of nearly effortless internal data analysis reports for agencies also seems like a solid starting point for this technology. Instead of sending your manager charts, graphs, and tables of page view data, how about a narrative? And instead of you needing to take hours to analyze the data yourself (and perhaps missing a trend or a connection) all you have to do is push a button.
Along with using NLG to produce standard data analysis reports, you can also use this initial auto-generated product as your starting point and then flesh it out accordingly. For example, you could have an NLG run a quick analysis of your agency twitter feed (I ran a quick one for my personal feed), and then use that as the basis for a full report that you later would share as part of your social media report.
Highly Personalized Content Creation
Another area where NLG seems to be in the early stages of its potential is in allowing for the generation of highly personalized content. Despite living in an age where advertising especially has become downright creepy in its personalization levels, the actual content delivered is still not uniquely developed for you. Browsing patterns and history, combined with likes or comments within social media allows for pre-existing content to be served based on preferences. NLG will allow for the generation of content that is specifically made for you based on various data available. One area where this has the potential to save lives is within healthcare.
Through the Open Health Natural Language Processing (OHNLP) Consortium and at least two grants in 2009, NIH has been supporting efforts to use NLG to provide the best health information possible to patients in the most accessible format.
One of the provided grants was for cTAKES, which started at the Mayo Clinic and has since grown and become an Apache incubator project.
cTAKES leverages Apache’s UIMA (Unstructured Information Management) applications to scan large volumes of electronic medical records to discover patterns, trends, causal relationships, and temporal events. This data can then be provided to the physician in an understandable, narrative format. As shown in the example below, cTAKES gathers an array of examination information and in the end provides plans and recommendations that are in simple sentences.
Other projects such as Migraine, Piglit, OPADE, and HealthDoc are in various stages of trials to provide direct feedback to a patient using NLG to draw from various sources such as user feedback, medical records, or existing sources such as drug databases. All four projects show promise in generating personalized content for migraine sufferers or diabetes patients such as the PIGLIT example shown below.
One other area where NLG seems to be making a big difference (and actually was an early adopter of the technology) is in weather forecasting. As far back as at least 2001, NWS was leveraging natural language and speech generation for accessible weather forecasts, especially in an emergency. Similar efforts have been made in generating text forecasts to quickly provide critical weather information for oil rig operators for instance. A specific effort for a UK system actually found that NLG text was easier to understand than human-generated due to the analysis done to develop word choice and a need for clarity and consistency.
I can also see how NLG could help create better content by programming the optimal headline, sentence, and paragraph lengths and removing common cliches. Sorry, that last sentence made me feel a cold hand of a robot touching me on my shoulder, motioning for me to get out of its way (shivers).
Importance of Structured Content
A common theme that runs through the use of NLG is its reliance on patterns and structured data. cTAKES is only able to be successful by leveraging UIMA that provides structure to varied electronic medical records. The usage of an established vocabulary is critical as machines are being asked to “understand” health and medical information. One of the ways to foster this understanding is the U.S. National Library of Medicine (part of NIH) and their work with the Unified Medical Language System (UMLS). This system is a complex attempt to link vocabularies, categories, and lexical programs to allow for the interoperability that we strive for within our open and structured content efforts throughout this community.
NWS also provides similar resources for weather forecasting with its National Digital Forecast Database XML Web service. Providing data in a structured format allows users to leverage that data as needed.
When needing to pull data from a wide variety of sources in order to generate natural language narratives via computer, it is extremely helpful that datasets be predictable. Such as entering geographical data in the same field, being consistent in terms used for states (spell out or abbreviate and how is it abbreviated?), spell out pharmaceuticals, enter weeks, days or months in a consistent format and in the correct fields. Sound familiar? Its essentially all metadata. And NLG depends heavily upon this structure of data (and the structure built around its word and grammar templates) to generate the most readable content possible. Without this attention to detail and structure in the beginning, the push button convenience starts to fall apart. Again, why open and structured content is so important, it allows for yet undiscovered uses of your data to be possible if the structure is there.
Despite predictions that over 90% of all news will be computer-generated within 15 years, we should embrace the opportunities that NLG can provide us and remember that it is a tool or an aid, not another machine come to take away our jobs. As we all are faced with doing more with less, being able to auto-generate easy to read reports based upon large amounts of data can be one less thing that we have to expend resources on. To be honest, despite regularly reviewing the lines and lines of analytics data my site generates, I found the NLG-generated report for my site to be revelatory. And more important, easy-to-read for anyone with interest in the site’s data trends and performance. My effort to create a five page report narrative with charts? I had to copy and paste a URL and login to my analytics account (phew, I need a nap).
Even if the machine provides only a head-start, that is still a time-savings that allows us to devote our time elsewhere. The main focus should be on delivering the best product possible to our audience and NLGs can help us provide content faster and possibly even cover more diverse or niche subject matter or improve their health and their lives.You’ve just finished reading the latest article from our Monday column, The Content Corner. This column focuses on helping solve the main content issues facing federal digital professionals, including producing enough content and making that content engaging.Edit