
NATERGM: A Model for Examining the Role of Nodal Attributes in Dynamic Social Media Networks
Abstract
NATERGM: A Model for Examining the Role of Nodal Attributes in Dynamic Social Media Networks management report in data mining are dynamic. As such, the order in which network ties develop is an important aspect of the network dynamics. This study proposes a novel dynamic network model, the Nodal Attribute-based Temporal Exponential Random Graph Model (NATERGM) for dynamic network analysis. The proposed model focuses on how the nodal attributes of a network affect the order in which the network ties develop. Temporal patterns in social media networks are modeled based on the nodal attributes of individuals and the time information of network ties. Using social media data collected from a knowledge sharing community, empirical tests were conducted to evaluate the performance of the NATERGM on identifying the temporal patterns and predicting the characteristics of the future networks. Results showed that the NATERGM demonstrated an enhanced pattern testing capability and an increased prediction accuracy of network characteristics compared to benchmark models. The proposed NATERGM model helps explain the roles of nodal attributes in the formation process of dynamic networks.
INTRODUCTION
Social media networks are emerging online networks that virtually connect individuals. These networks consist of nodes that represent individual social media users and ties that represent various relationships between the users. Examples of social media networks include online friendship networks, following-follower networks, and content sharing networks. The relationships between the online users are often public information, which provides opportunities for using social network analysis (SNA) to better understand how and why individuals establish social connections online . As a result, a growing number of studies have used SNA to examine social media networks. Social media networks have two important characteristics. First, they are dynamic in nature. Network ties develop in an order, but not simultaneously. As such, relationships between individuals may change over time.
Second, social media users differ in various attributes, such as gender, functional role in online communities, and reputation. As a result, social media networks are multimode networks and different node types exist in the network. A consequence of these two characteristics is that the seemingly same network patterns can result from different network formation processes, depending on the order in which the network ties develop. Here, we assume that the black nodes represent highly active individuals (e.g., individuals who frequently come online and leave messages) in online communities and the numbers next to network ties indicate the order in which the relationships develop. The Pattern A illustrates a process where highly active individuals are prioritized over others when developing relationships, while the pattern B illustrates the opposite tendency. If the order in which the network ties develop is ignored, we are unable to differentiate between these two patterns and understand how highly active individuals participate in the dynamic process of network formation.
Differentiating between various temporal patterns is thus critical to understand the formation mechanisms of social media networks. However, current social network research usually adopts a static view of networks based on the assumption that all network ties have developed concurrently upon observation. This assumption, while contributing to simplicity and being useful for identifying static patterns of networks, leads to reduced representation of real social media networks. As a result, the ability of social network analysis to identify network patterns may be negatively affected. The problem can further re duce the practical value of social network analysis to understand various network phenomena in social media contexts. In this study, we propose a novel dynamic network model, the Nodal Attribute-based Temporal Exponential Random Graph Model (NATERGM), for dynamic network analysis.
NATERGM is an extension of TERGM and focuses on how nodal attributes of networks affect the order in which network ties develop. The proposed model extracts nodal attributes of individuals and time information of network ties from social media networks, based on which various temporal patterns are modeled and their likelihoods of occurrence are estimated. Extending prior work [13], with empirical data we demonstrate that NATERGM provides an enhanced pattern testing capability compared to TERGM. Moreover, NATERGM is able to predict the characteristics of social media networks in future and we show that our approach outperforms TERGMbased prediction models. The major objective of this study is to provide a framework to explore, analyze, and explain the formation mechanisms of social media networks.
RELATED WORK
In this segment we first audit late investigations examining so-cial media networks. At that point, we audit developing system models for dynamic system examination.
Social Media Networks
In view of a hypothetical conceptualization of system ties, four sorts of social media organize ties have been outlined in earlier research . Vicinity ties speak to that two people have a place with a similar sub-groups (e.g., Facebook Group) or locational zones. Social connection ties speak to social associations between people, for example, virtual companionships and membership connections in mi-cro-blogging destinations . Connection ties speak to between dynamic practices between people, for example, data trades through message answers . Stream ties speak to the development of products or data between organize hubs, for example, retweets. A few analysts have contended that these sorts of ties are not really decoupled, but rather speak to a continuum . For instance, vicinity may additionally prompt social relations; collaborations and streams of knowledge may happen in the meantime.
Social media networks have been contemplated for various purposes. When all is said in done, the examination goals of these investigations can be grouped into three classifications. The main stream of research centers around clarifying system instruments. This kind of research goes for understanding in what conditions people will probably set up social associations on the web. For instance, statistic homophily was found to exist in online fellowship networks. Understudies of a similar sexual orientation, major, and habitation territory will probably set up social associations in Facebook kinship networks. Earlier research has likewise discovered that immediate correspondence, circuitous correspondence, and special connection happen every now and again in online web gatherings .
The second stream of research looks at how the structure of a social media arrange influences the results of people in the system. This sort of research is alluded to as basic capital investigations . For instance, an examination of kinship networks in an online small scale loaning stage prompted disclosures that the odds of effective financing were fundamentally influenced by the quantity of fellowship ties and by the kinds of companionship . Research has discovered that people in an associated organize can anticipate results of a given issue all the more precisely, contrasted with the situations when they are detached.
Another famous research region is to segment the system into sub-graphs and recognize subgroups. These investigations for the most part go for distinguishing key gatherings or players in the system and understanding the qualities of these sub-groups. For instance, in light of centrality and coreness measures, center gatherings and key individuals in the center gathering who were most dynamic were distinguished in a clinical talk discussion. Another investigation recognized Twitter client bunches from following-devotee networks in Twitter.com and analyzed the impact of intra-aggregate ties, between amass ties, and middle person ties on retweeting practices.
System Configuration:
H/W System Configuration:-
Processor : Pentium IV
Speed : 1 Ghz
RAM : 512 MB (min)
Hard Disk : 20GB
Keyboard : Standard Keyboard
Mouse : Two or Three Button Mouse
Monitor : LCD/LED Monitor
S/W System Configuration:-
Operating System : Windows XP/7
Programming Language : Java/J2EE
Software Version : JDK 1.7 or above
Database : MYSQL
RESEARCH DESIGN
In order to evaluate the performance of NATERGM, we conducted two empirical tests. The first test focused on the pattern testing capability of NATERGM. The second test focused on how accurately our model can predict the characteristics of future networks. This section describes the research test-bed used for empirical study and outlines the two experiments.
Research Test-bed Social media data were collected from WikiAnswer.com, which is a large online knowledge sharing community. Community members can ask questions about any topics, and answer others’ questions as well. Open questions go through the hands of many contributors, some of whom directly provide answers, while others edit the posted answers in terms of content, language, or format. Finally, the questions and answers are organized into Q&A entries that can be accessed by all community members including the questioners. It is a good test-bed for testing NATERGM because its wiki-based “answer history” system allows us to see how members in this community develop social connections when seeking help and answering questions. We established a directed tie from member A to member B if A answered a question from B.
Therefore, network ties represent knowledge flows in this community and as a result, knowledge diffusion networks were extracted from social media. The test-bed also allows for identifying timestamps associated with these network ties. Since many questions require contributors to have relatively deep knowledge in a field to answer, members in WikiAnswer.com form specialized sub-communities to handle questions that belong to similar topics. In this study, we focused on three sub-communities: diabetes, online shopping, and real estate, based on the popularity of these topics and the number of relevant Q&A entries in the community.
Furthermore, we observed that a great number of members were inactive and only participated in knowledge sharing activities very limitedly (e.g., only made a change to capitalization once). Since we were interested in the most representative members in the community, we restricted analysis within members who asked questions, provided the first answers, or made significant content change (this type of contribution is separately classified in WikiAnswer.com) to answers more than once. In the resulting networks from WikiAnswer.com, each user made connections with others (by providing or receiving answers) 1.8 times on average. The network density ranged from 0.06% to 0.2% for the three sub-community networks, suggesting that these networks are quite sparse. Overall, the degree (number of connections) of individual users followed the power law distribution, with the highest degree as 14.
Pattern Testing In order to show the enhanced pattern testing capability of NATERGM, we compared it with TERGM and used both models to explain network formation. TERGM was chosen as the baseline model because it is also capable of modeling the order in which network ties develop. However, it does not explain what roles nodal attributes play in determining the order of network ties. NATERGM and TERGM used for pattern testing included different sets of model terms. Baseline model terms included arc, reciprocity, 2-out-star, 2-in-star, transitivity, and cyclicity. For terms that were not significant in the baseline model, their extensions need not be tested because the extended temporal patterns are subset of their corresponding root patterns by additionally considering nodal attributes.
Therefore, if a root pattern does not frequently appear in the network, its extension would not become significant either. We evaluated three types of nodal attributes for pattern testing with NATERGM. For platform-based features, we identified registered members and unregistered visitors (attr=reg). Although WikiAnswer.com does not mandate a new user to register for posting or answering questions, registered members can accumulate “trust points” and obtain honor badges based on their contributions over time. We expected that registered members might have more commitment to the community and more willingness to contribute than unregistered anonymous visitors, thereby showing different behaviors from unregistered visitors when developing network ties. The attribute was measured as a binary variable, where reg = 1 if the member was registered, and reg=0 otherwise. For textual features, we evaluated writing proficiencies (attr=pro) for community members.
Writing proficiency reflects an individual’s level of literacy, expertise, and educational background. We expected that members with high levels of writing proficiency would contribute significantly to the online knowledge sharing communities. Hence, we were interested in understanding what roles these members would play in the formation process of the dynamic knowledge diffusion networks. Prior linguistic studies have suggested that writing proficiency can be assessed based on various factors . We employed text mining techniques to evaluate the following five metrics of each member in WikiAnswer.com. First, the average length of a member’s answers was evaluated because it reflects the depth of the member’s knowledge about the problem.
Conclusion
NATERGM: A Model for Examining the Role of Nodal Attributes in Dynamic Social Media Networks management report in data mining.Dynamic interaction between various types of individuals in social media is a complex process and the order of network ties is an important aspect of social media network dynamics. We represented various temporal patterns of network formation based on nodal attributes and the order of network ties development and developed NATERGM model for dynamic network analysis. We conducted empirical tests to evaluate the performance of NATERGM and results showed that NATERGM has an enhanced pattern testing capability and potentially better prediction accuracy of network characteristics compared to previous dynamic network models.
Compared to existing TERGMbased models, our proposed model can test more complex dynamic patterns resulting from the interaction between network tie formation and nodal attributes, thereby discovering how various nodal attributes are affecting the formation process of a dynamic network. In practice, the proposed model can be used to evaluate the impact of individuals’ attributes in the formation process of dynamic social media networks. By examining these attributes, social media designers can understand what factors are critical to the social network evolution and determine what functionalities to add or promote in their platforms.







