HINTS: Fake news detection via passive network analysis
·
Fake news is a problem across a variety of platforms with detrimental impacts. HINTS (the Human Interaction News Trustworthiness System) is a patent-pending passive network indicator that detects fake news by analyzing the pattern of how it is shared and by whom. We use two principles: (1) the type of news that people have liked in the past is a good predictor of news they will like in the future; and (2) any post requires a large number of interactions and attention to be considered “news”. HINTS is seeded with labeled data (content labeled as untrustworthy and accounts that have liked or shared untrustworthy content), and the algorithm then predicts which content will be labeled as fake news later on. Fake news is not a problem in a vacuum. It becomes a problem proportional to visibility. Focusing on people likely to propagate fake news and URLs likely to be fake allows for remediation (e.g., by limiting who sees those posts, or prioritizing them for human checks).
Proof of concept
We describe below two examples of the HINTS algorithm’s success on a small dataset. We used the free tier of the Twitter API to obtain posts from a random selection of 1000 Twitter accounts that had posted using two seed hashtags. HINTS was able to quickly isolate unrelated top fake news articles within 3 rounds of updating weights in our network.
Pseudo code
Initialize: Create a graph of users and articles: with edge if a user saw an article, with positive weight for a positive interaction, slight negative weight for no interaction, or strong negative for a negative interaction.
Seed: Use labeled data (e.g., Politifact, known fact-checkers + botnets). Mark those nodes accordingly.
Iterate: Use HITS to update graph weights until they converge (labeled data has fixed values).
Annotate: Mark articles with a fake score above a threshold.
(Optional) ML/NLP: Extract fake articles to feed into a content-based algorithm for earlier detection.
Application
Technical workflow and example
In this example, our goal is to find right-wing political hashtags. We start with a seed, #Khashoggi (which is correlated with sharing misinformation1,2) and find the other content which is related to #Khashoggi via a shared user base. HINTS finds that the top link shared by people who talked about #Khashoggi over the time period was “Exposed: O'Rourke illegally funding caravan?”
Stage
Comments
Example
Example 2
Choose seeds (explicitly or implicitly labeled)
Can be manually annotated or self annotated such as #QAnon or users in a group
We choose #Khashoggi as our seed. The meaning of our label is right wing political hashtag. Note that this is not completely error free since some people use it differently
Choose #QAnon as seed
Define neighborhood. These are the set of nodes (people/pages/hashtags) which are linked to by the seeds. Neighborhood should be a sample/all of distance at least 2 for best results. Larger neighborhoods have more power
The edges are of the form (A posts hashtag B) and (B is posted by A)
We choose a sample of 1000 users who had used the hashtag. Total users was over 25k but we used less for faster computation.
Choose 1000 users
Run the propagation algorithm to propagate the labels with appropriate weights
We ran 3 rounds of updating both people and hashtags while keeping the weights of the labeled hashtag constant. Convergence is very fast.
We ran 3 rounds of updating both people and hashtags while keeping the weights of the labeled hashtag constant. Convergence is very fast.
Output the top weights which are now labeled (with error probabilities)
Can output both links/hashtags as well as people.
The algorithm found other right wing political hashtags as expected. Examples are:
#AmericaFirst, #MAGA, #RedWave
Top news article: Beto illegally funds caravan with campaign donations.
Finds other conspiracy theories such as #WWG1WGA, #GreatAwakening, #TheStorm, #VoterFraud etc. Note that although #MAGA is associated with #QAnon it is not disproportionately so.
Optionally run an NLP/ML algo on the larger labeled data set
I love the possibilities this opens up for click farms. We could have “social honey pots” who attract multiplicators of fake news in order to make it easier for HINTS to identify fake news.
But then again, somebody who wanted to bury real news could employ such a divison of social honey pots with a bad score to share and like real news.
Or right wing groups (and their opponents) could further refine their “muddying the water” strategies. Let’s face it, we’re entering cascading dynamics of arms races within the context of hotter and colder information wars.
But for the time being, HINTS would be better than many other solutions.
Samuel Klein:
Could be a propagandist, could be a bot, could be credulous. Better term here?