Identifying Critical nodes on Twitter

Work based on research conducted under Dr. Samik Basu and Dr. Pavan K. Aduri at Iowa State University

This is part 3 of a 3-part series on the study of critical nodes in a network.
Links to: part 1 and part 2

Goal:

Given a social network graph, like twitter, and an integer k, identify k nodes (user accounts) that when selected as the source will generate the maximum spread of misinformation in the network.

Solution:

This is a direct implementation of the algorithm described in [1].

Experimental Setup:
We use the same graph that we used in the previous 2 cases. For this experiment, we set the value of k to be 20. That is, we want to compute the 20 nodes that when tweet out misinformation, the number of the nodes that will be influenced by that information is maximised. The C++ code for the version of the algorithm when the seed set is known is provided here.
This is the same link as the code for the part 2 experiment. For running this algorithm, set the useSeedProvidedAsInput variable to be false and change the argument budget to the number of nodes that you want to find.
(The below part is the same as the earlier 2 parts)
For the purpose of this experiment, tweets mentioning Hong Kong were filtered using the Tweepy library written in Python and the Twitter API. The Python code for the streaming process can be found here. Tweets were collected from Tue Aug 13 21:30:14 +0000 2019 to Thu Aug 15 13:08:30 +0000 2019. In all, around 1.2 million tweets were collected.
A graph was created from the tweets. Graph was created in the following manner: there are 3 actions that any user can take from a tweet: retweet, retweet with comment, and reply. All 3 actions are perceived as engaging with the tweet. An edge was created from a user v to a user w if w took any of the above stated 3 actions on v's tweet. The graph created in this manner contained a total of 607525 nodes and 994220 edges. The JAVA code for the parsing of the Tweet JSON object and the subsequent creation of the graph can be found here.

Results and Observations
Below are the top 20 user accounts as selected by algorithm.

Example2


References
[1] Maximizing the Spread of Influence through a Social Network
[2] Influence Maximization: Near-Optimal Time Complexity Meets Practical Efficiency
[3] Disrupting diffusion: Critical nodes in network