
Quantifying Political Leaning from Tweets, Re tweets, and Re tweeters
Abstract
Quantifying Political Leaning from Tweets, Re tweets, and Re tweeters management report in data mining .The widespread use of online social networks (OSNs) to disseminate information and exchange opinions, by the general public, news media and political actors alike, has enabled new avenues of research in computational political science. In this paper, we study the problem of quantifying and inferring the political leaning of Twitter users. We formulate political leaning inference as a convex optimization problem that incorporates two ideas:
- users are consistent in their actions of tweeting and retweeting about political issues, and
- similar users tend to be retweeted by similar audience.
We then apply our inference technique to 119 million election-related tweets collected in seven months during the 2012 U.S. presidential election campaign. On a set of frequently retweeted sources, our technique achieves 94% accuracy and high rank correlation as compared with manually created labels. By studying the political leaning of 1,000 frequently retweeted sources, 232,000 ordinary users who retweeted them, and the hashtags used by these sources, our quantitative study sheds light on the political demographics of the Twitter population, and the temporal dynamics of political polarization as events unfold.
Introduction
One of the most challenging problems in the intersection of politics and online social media is to use Twitter to predict election outcomes. Although some success has been claimed , it has also been argued that the election prediction problem is difficult because of sampling bias among the voter population . In order to correct for bias, it would be helpful to have some prior understanding of the population of study. For example, the opinion of a politically biased person should be discounted, but a swing in opinions among unaligned voters is alarming. This motivates the usefulness of estimating the political leaning of the Twitter population. Estimating political leaning is no easy task. In particular, there are two key challenges:
- Quantification: Is it possible to assign meaningful numerical scores to tweeters about their position in the political spectrum?
- Scalability: Given Twitter’s large scale and server limitations, how can we devise a method that is efficient and scalable?
Most of the existing approaches focus on using tweet text and/or the Twitter follower graph for the task and cannot meet at least one of the challenges. We take a new approach by incorporating retweet information. Analogous to using link analysis techniques for ranking webpages, we propose a consistency condition between tweeting and retweeting behavior, and use it to devise an inference technique that is:
- Simple: it does not require explicit knowledge of the network topology, and works within rate limits imposed by the Twitter API;
- Efficient: computationally efficient because it is formulated as a convex optimization problem, and data efficient because the time required to collect sufficient data to obtain good results is short; and
- Intuitive: the computed scores have a simple interpretation of “averaging.”
To evaluate our inference technique, we collected a set of 119 million tweets on the U.S. presidential election of 2012 over a timespan of seven months. Using the data, we quantify the political leaning of:
- major media outlets that have a Twitter account,
- the most prominent tweeters in terms of the number of retweets received, and
- media outlets studied in the existing works that quantify media bias.
The efficacy of our inference technique is demonstrated in our results agreeing with both conventional wisdom and results from similar but smaller scale studies. Our study has a number of implications.
(a) From a modeling perspective, we see evidence that tweeting and retweeting are indeed consistent, and this observation can be applied to develop new models and algorithms.
(b) From an application perspective, besides election prediction, our method can be applied for other purposes, such as building an automated tweet aggregator that samples tweets from opposite sides of the political spectrum to provide users with a balanced view of controversial issues in the Twittersphere. Our methodology can also be applied to other fields marked by partisan viewpoints, such as market segmentation (e.g., iPhone vs Galaxy).
(c) Regarding politics, our collected dataset and analysis shed light to the political landscape of the Twittersphere.
Related Work
Our work is related to three lines of work: ideal point estimation, media bias quantification, and politics in online social media. In political science, the ideal point estimation problem and its extensions aim to estimate the political leaning of legislators from roll call data. This line of work assumes legislators to vote probabilistically according to their positions (“ideal points”) in a latent space, and the latent positions are statistically inferred from observed data, i.e., how they vote. The main difference between our work and this line of work is in the data: while legislators are characterized by their voting history, which can be considered as their explicit stances on various issues, we do not have access to comparably detailed data for most Twitter users.
A variety of methods have been proposed to quantify the extent of bias in traditional news media. Indirect methods involve linking media outlets to reference points with known political positions. For example,linked the sentiment of newspaper headlines to economic indicators. linked media outlets to Congress members by co-citation of think tanks, and then assigned political bias scores to media outlets based on the Americans for Democratic Action (ADA) scores of Congress members. performed an automated analysis of text content in newspaper articles, and quantified media slant as the tendency of a newspaper to use phrases more commonly used by Republican or Democrat members of the Congress.
In contrast, direct methods quantify media bias by analyzing news content for explicit (dis)approval of political parties and issues. analyzed newspaper editorials on Supreme Court cases to infer the political positions of major newspapers. used 60 years of editorial election endorsements to identify a gradual shift in newspapers’ political preferences with time. There has been much interest in characterizing political polarization of online social media. Outside of Twitter, analyzed link structure to uncover polarization of the political blogosphere. incorporated user voting data into random walk-based algorithms to classify users and news articles in a social news aggregator. inferred the political orientation of news stories by the sentiment of user comments in an online news portal. assigned political leanings to search engine queries by linking them with political blogs. Regarding Twitter, political polarization was studied . Machine learning techniques have been proposed to classify Twitter users using e.g., linguistic content, mention/retweet behavior and social network structure . applied label propagation to a retweet graph for user classification, and found the approach to outperform tweet contentbased machine learning methods.
System Configuration:
H/W System Configuration:-
Processor : Pentium IV
Speed : 1 Ghz
RAM : 512 MB (min)
Hard Disk : 20GB
Keyboard : Standard Keyboard
Mouse : Two or Three Button Mouse
Monitor : LCD/LED Monitor
S/W System Configuration:-
Operating System : Windows XP/7
Programming Language : Java/J2EE
Software Version : JDK 1.7 or above
Database : MYSQL
Proposed Approach
A Motivating Example
To motivate our approach based on retweets, we consider a small example based on some data extracted from our dataset on the presidential election. Consider a proRepublican media source A and a pro- Democrat media source B. We observe the number of retweets they received during two consecutive events. During the “Romney 47 percent comment” event1 , source A received 791 retweets, while source B received a significantly higher number of 2,311 retweets. It is not difficult to imagine what happened: source B published tweets bashing the Republican candidate, and Democrat supporters enthusiastically retweeted them. Then consider the first presidential debate. It is generally viewed as an event where Romney outperformed Obama. This time source A received 3,393 retweets, while source B received only 660 retweets. The situation reversed with Republicans enthusiastically retweeting. This example provides two hints:
- The number of retweets received by a tweeter (the two media sources) during an event can be a signal of its political leaning. In particular, one would expect a politically inclined tweeter to receive more retweets during an event favorable to the candidate it supports.
- The action of retweeting carries implicit sentiment of the retweeter. This is true even if the original tweet does not carry any sentiment itself.
Summary of Our Approach
Our inference technique is built upon the assumption that the two forms of expressing political opinions, tweeting and retweeting, are consistent. Given a large set of tweets, we group them into sets of relevance: in this paper, we group tweets by events because of simplicity (it can be done just by looking at a time series in our case study), but other forms of grouping is also possible, such as by issues (economic, diplomatic, religious). This grouping of tweets allows for a more fine-grained analysis, e.g., tracking change of political leaning over time, and provides more datapoints for our estimation problem. The next step is to estimate, for every event, a numerical score that quantifies the approval of the candidates by the aggregate Twitter population.
This can be done using offthe- shelf sentiment analysis tools. It may seem that the performance of our technique will crucially rely on the performance of sentiment analysis, but we will show in our case study that just getting the right trend in sentiment is sufficient. It has also been shown that Twitter sentiment trends computed with standard techniques correlate with poll results and socio-economic phenomena (O’Connor et al. 2010; Bollen, Pepe, and Mao 2011). Recall that the action of retweeting carries information on the political opinions of the retweeter. We can thus define the political leaning of a retweeted tweeter as the average approval score a person wishes to express when retweeting any of its messages. This political leaning score is on the same scale as the average score (per tweet) from the previous step. Then for every event, we can average over the political leaning scores of all retweets in that event.
We defined the dates of an event as follows: the start date was identified based on our knowledge of the event, e.g., the start time of a presidential debate, and the end date was defined as the day when the number of tweets reached a local minimum or dropped below that of the start date. After the events were identified, we extracted all tweets in the specified time interval5 without additional filtering, assuming all tweets are relevant to the event and those outside are irrelevant. Noise Considerations There are three factors that can introduce noise to our computed y and A:
- the dataset may contain irrelevant tweets, e.g., those about the Egyptian or French presidential election,
- not all tweets created during an event time interval are necessarily talking about the event, and
- political tweets are difficult to classify (Maynard and Funk 2011; Mejova, Srinivasan, and Boynton 2013), and off-the-shelf sentiment analysis tools may not perform sufficiently well (Gayo-Avello, Metaxas, and Mustafaraj 2011).
Although more careful data processing is possible, we opted for the current simple approach because we believe our inference technique is robust to noise given sufficient data. To understand the intuition, we can consider our formulation of quantifying political leaning as a system taking in noisy input signals y and A to output estimates x. As the system accumulates more events, the size of input (sizes of y and A, which scale with E) increases, but the size of output (size of x, which is N) remains the same. Effectively we are increasing information to improve estimation accuracy.
Conclusions and Future Work
Quantifying Political Leaning from Tweets, Re tweets, and Re tweeters management report in data mining .Motivated by the election prediction problem, we study in this paper the problem of quantifying the political leaning of prominent members in the Twittersphere. By taking a new point of view on the consistency relationship between tweeting and retweeting behavior, we formulate political leaning quantification as an ill-posed linear inverse problem solved with regularization techniques. The result is an automated method that is simple, efficient and has an intuitive interpretation of the computed scores. Compared to existing manual and Twitter network-based approaches, our approach is able to operate at much faster timescales, and does not require explicit knowledge of the Twitter network, which is difficult to obtain in practice.
To evaluate our inference technique, we collected a large dataset of 119 million U.S. election-related tweets over a span of seven months. We applied our inference technique to quantify the political leaning of media outlets and prominent Twitter users. We also showed our results are in good agreement with existing work quantifying media bias, and analyzed the time dynamics of the computed political leaning scores. This work is a step toward systematic approaches in quantifying behavior on social and political issues. The Retweet matrix and retweet average scores can be used to develop new models and algorithms to analyze more complex tweetand-retweet features.
It is interesting to see that our simple model of tweet and retweet dynamics can be applied to achieve useful results, but our approach in using solely retweet information has its limitations. In particular, our approach does not quantify less popular sources who do not get retweeted often, and parody accounts which show less regularity in their tweeting behavior. Many other extensions are possible, especially by obtaining and incorporating more information, such as the sentiment of retweets, network structure and user history. Our methodology may also be applicable to other OSNs with retweet-like endorsement mechanisms, such as Facebook and YouTube with “like” functionality.







