By Dr. Andrew Duchon, Director of Data Science at Manzama, Inc.
Business development and competitive intelligence both require understanding what is happening with client companies, prospective clients, and even entire industries. One attorney (or any B2B salesperson) might have 10 clients and 90 prospective clients, each of those might have, on average, 200 articles a month appearing in the news, noting events taking place with respect to the company, e.g., deals they are making with other companies, executives coming or going, sales or other financial reports, and interactions with regulators. That’s 20,000 articles per month an attorney might need to read to be fully up-to-speed, and that does not even account for general news and industry trends.
Instead of taking this bottom-up approach to understanding the big picture, Manzama Insights™ uses machine learning to process and organize all that data so that users can quickly get the big picture first, then dive into particular aspects relevant to them. The first question is what are the colors of that big picture? Manzama Insights classifies news concerning a company into 25 subfactors which are grouped into six factors as shown in the table below. There are also six kinds of company mentions that are ignored which may be relevant to understanding a company generally, like conferences and marketing but are not relevant to their corporate health.
Empirically, we have found that all relevant news about a company can be classified into one of these 25 subfactors and we have 30 pages of guidelines to help us make these determinations.
Of course, we ourselves are not personally classifying the 100s of thousands of articles coming into Manzama every day. That’s where machine learning comes in. We have trained a deep neural network to turn the words in a headline about a company into numbers which can be processed by the neural network to determine which of the 25 relevant subfactors are being discussed.
In addition, the network determines the “valence” of the news for the company: positive, negative, or neutral. In general, negative articles indicate “bad” news about the company or are indicative of the company shrinking, whereas positive articles indicate “good” news about the company or are indicative of growth. In other cases, it reflects whether a relationship between the company and some other entity is good or bad. Neutral articles are generally of a few different types: the goodness or badness of the news is unclear; the news has no real impact or indication of corporate health but should not be excluded; the news is both positive and negative, e.g., “this was good, but that was bad”; the company is neutralizing bad news; or discussions of an event after it has already occurred. Of course, the news can concern more than one company, so Insights analyzes the news relative to each company mentioned. “KPMG overtakes PwC in FTS 100 Audit Market” is obviously good news for KPMG and bad news for PriceWaterhouseCoopers in the subfactor of Competition.
On the whole, this balance of positive and negative news is used to calculate a corporate “Health Score.” We normalize this score from -10 (very unhealthy) to +10 (very healthy). Most companies range around 0, which is considered normal. Life happens, bad things happen: but can the company overcome those and keep growing? That’s a typical company. Sometimes everything goes their way. But how likely is that to continue? On the other hand, with enough bad news, a company might go bankrupt, or have their remaining assets merged into another company and thereby disappear. Using Manzama’s historical data, we’re modeling how these possibilities play out.
Besides category and valence, each relevant article is also labeled True or False for having an element of Litigation, Rumor, or Opinion, each labeled independently, which we call Aspects. By Litigation, we mean are the courts or lawyers involved. Rumor refers to actual rumors, leaks, forecasts, or other events that have not definitively occurred. With Opinion, we hope to classify articles as representing a personal opinion (vs. a fact) including speculation, commentary, or otherwise unverifiable information.
After all the news about a company has been classified as to its subfactor, valence, and aspects, the next step is to group the articles together for the user. Of those 20,000 articles mentioned hypothetical above, there is likely a lot of redundancy: there are not 20,000 different events taking place, just re-syndicated or re-worded articles about the same, say, 2,000 events. Manzama Insights helps here as well. On a given day, for a given company, within a subfactor, there is likely to be just one or two events occurring. So Insights clusters those articles into “stories,” each discussing just one event from one aspect. Over time, across days, stories are grouped into “storylines.”
By starting at the storyline level, users can get a big picture and then see how events have played out over time very quickly. To take a recent example, in early December 2018, Google decided to shut down its Google+ social network, earlier than anticipated. The AI in Insights connects that news to stories two months earlier to give the user a broader context for this news.
Using Insights, a lawyer working in Cyber Security could be alerted to just those types events, but across 100s of companies, knowing that the data they received would still be quite sparse and quickly point them to the context they would need to know to reach out to that client or potential client.