HOW DOES THE TURBINE LABS VIZ WORK?
The Tableau visualization was created by the Turbine Labs team through an integrative process of machine learning (ML) and human validation to ensure high-quality results. Representing a sample of more than 500,000 English news articles related to COVID-19, we were able to synthesize thousands of articles into trending topics and categories in the news coverage.
Our in-house journalist team, who produce a daily COVID-19 Briefing free of charge to the public, first identified top news categories such as Public Health, Business, Way of Life Disruption, Politics, Economy, etc. to be used in our model. We then used k-means clustering technique to identify diverse news articles in our dataset which were labeled into one of the categories by our team. All this human-labeled training data was used to train a ML category classifier that was then able to assign a category to all the news articles. As a final step, the application of n-gram modeling grouped words together, determined frequency of use, and scored groups of relevant articles to surface the top phrases per category in the visualization.
By leveraging human and artificial intelligence, the output produces a time sequence, mapping, and ranking of the top themes frequently discussed in news coverage. Users have the ability to determine when stories initially appeared in the media, how coverage of various topics compare with one another, and how often terminology appears in news coverage.
WHAT ARE THE USE CASES FOR THIS CLUSTERING TECHNOLOGY OUTSIDE OF COVID-19?
This dataset and the accompanying visualization serve as a real-world example of how machine learning and clustering can be used to understand key trends and patterns within large volumes of text data. However, the use cases extend far beyond the topic of COVID-19. For example, enterprises can use this technology to understand financial, competitive, and market topics to more quickly and accurately inform executive decision making. Political candidates and campaigns can quickly determine key messages that are resonating among their constituencies or among their opponents. And policy and lobbying firms can more quickly determine tide shifts on topics for which they are advocating.
In addition, other proprietary scoring attributes developed by Turbine Labs, such as content Relevancy, Impact, and Authority, can be woven into the analysis for an even more powerful perspective. When combining these attributes with sentiment analysis, our technology provides a holistic, deep understanding of public sentiment and discussion.
It is our hope that this visualization will provide the public with the vital data and information that it needs to make the best decisions possible in regards to health, safety and wellness. Staying up-to-date and informed during the pandemic keeps us all safer and more connected.