Epidemiological Modeling Of News And Rumors On Twitter

3y ago
10 Views
2 Downloads
2.04 MB
9 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Oscar Steel
Transcription

Epidemiological Modeling of News and Rumors on TwitterFang Jin , Edward Dougherty †, Parang Saraf , Yang Cao †, Naren Ramakrishnan Department of Computer ScienceGenetics, Bioinformatics, and Computational Biology DepartmentVirginia Tech, Blacksburg, VA 24061†{jfang8, parang, ycao, naren}@cs.vt.edu, †edougherty@vt.edu ABSTRACTCharacterizing information diffusion on social platforms likeTwitter enables us to understand the properties of underlying media and model communication patterns. As Twittergains in popularity, it has also become a venue to broadcastrumors and misinformation. We use epidemiological models to characterize information cascades in twitter resultingfrom both news and rumors. Specifically, we use the SEIZenhanced epidemic model that explicitly recognizes skepticsto characterize eight events across the world and spanninga range of event types. We demonstrate that our approachis accurate at capturing diffusion in these events. Our approach can be fruitfully combined with other strategies thatuse content modeling and graph theoretic features to detect(and possibly disrupt) rumors.Categories and Subject DescriptorsH.2.8 [Database Management]: Database Applications—Data Mining; I.2.6 [Artificial Intelligence]: Learning—Knowledge acquisition; Parameter learningGeneral TermsExperimentation, PerformanceKeywordsSIS, SEIZ, Epidemiological modeling, Rumor detection.1.INTRODUCTIONOnline social networks have become a staging ground formodern movements, with the Arab Spring being the mostprominent example. Nine out of ten Egyptians and Tunisiansresponded to a poll indicating that they used Facebook toorganize protests and spread awareness. As a precautionary measure, governments have taken to blocking social networking websites, showcasing the importance of understanding this phenomenon.Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, orrepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.The 7th SNA-KDD Workshop ’13 (SNA-KDD’13), August 11, 2013,Chicago, United States.Copyright 2013 ACM 978-1-4503-2330-7 . 5.00.Interestingly, the role of social networks is not limited tohelping organize the activities of disruptive elements. Manykey government and news agencies have also begun to embrace Twitter and other social platforms to disseminate information. After the tragic 2013 explosions at the BostonMarathon, the FBI resorted to online social networks tobroadcast crucial information about the suspects. The viraldiffusion of information provided them with vital information about the suspects. At the same time it is well knownthat online activity on sites such as Reddit led to mistakenidentification of some individuals and the spread of severalrumors.We were motivated to apply the latest in epidemiologicalmodeling to understand information diffusion on Twitter, inrelation to the spread of both news and rumors. Epidemiological models provide a classical approach to study howinformation diffuses. These models typically divide the total population into several compartments which reflect thestatus of an individual. For instance, common compartments denote susceptible (S), exposed (E), infected (I), andrecovered (R) individuals. Individuals transit from one compartment to another, with certain probabilities that have tobe estimated from data. The simplest model, SI, has twostates; susceptible (S) individuals get infected (I) by one oftheir neighbors and stay infected thereinafter. While conceptually easy to understand, it is also unrealistic for practical situations. The SIS model is popular in infectious diseasemodeling wherein individuals can transition back and forthbetween susceptible (S) and infected (I) states (e.g., thinkof allergies and the common cold); this model is often usedas the baseline model for more sophisticated approaches.The SIR model enables individuals to recover (R) but is notsuited for modeling news cascades on Twitter since there isno intuitive mapping to what ‘recovering’ means. The SEIZmodel (susceptible, exposed, infected, skeptic) proposed byBettencourt et al. [1] takes the interesting approach of introducing an exposed state (E). Individuals in such a state takesome time before they begin to believe (I) in a story (i.e.,get infected). While the authors of [1] used this approach tomodel the adoption of Feynman diagrams by communitiesof physicists, our work explores their use in modeling newsand rumors on Twitter.The key contributions of this paper are: Our work is the first to employ the SEIZ model tomodel real Twitter datasets. We employ non-linearleast squares optimization of the underlying systemsof ODEs over tweet data, and demonstrate how this

model is better at modeling rumor and news diffusionthan the traditional SIS model. We analyze eight representative stories (four true eventsand four rumors) across a range of topics (politics, terrorism, entertainment, and crime) and over several geographic regions (USA, Mexico, Venezuela, Cuba, Vatican). While not an exhaustive list, this demonstratesthe wide applicability of the proposed model. We demonstrate the capability of the SEIZ model toquantify compartment transition dynamics. We showcase how such information could facilitate the development of screening criteria for distinguishing rumorsfrom real news happenings on Twitter.2.RELATED WORKRumor modeling.As far as we know, Daley [5] first proposed the similaritybetween epidemics and rumors using mathematical analysis.Some researchers have studied rumor propagation modelingin different network topologies [13, 22]; however, they do notprovide any discussion of propagation differences betweennews and rumors. Shah et al. [17] detect rumor sources innetwork using maximum likelihood modeling. In [2], Budaket al. prove that minimizing the spread of the misinformation (i.e., rumors) in social networks is an NP-hard problemand also provide a greedy approximate solution. Castillo etal. [3] delve into twitter content modeling, such as sentimentanalysis and hashtags to identify rumors, while Qazvinianet al. [15] try to address this issue using broader linguisticmethods, to learn possible features of rumor and determinewhether a twitter user believes a rumor or not. More related work appears in [7, 19]. Our goal is to develop anunderstanding of these processes using diffusion models.Information Diffusion.Significant work has gone into research on informationdiffusion on social media, e.g., see [4, 9, 16, 21]. Recently,Matsubara etc. [10] conducted research on the rise and fallpatterns of information diffusion, and managed to capturethe power-law fall pattern and periodicities inherent in suchdata. Gomez-Rodriguez et al. [6] built a cascade transmission model to track cascading process taking place over anetwork; they traced overall blogs and news for a one-yearperiod and found that the top 1000 media sites and blogstend to have a core-periphery structure.Epidemiological models.Mathematical modeling of disease spread not only provides vital information about the propagation of the disease in a human network, but also offers insight into thestrategies that can be used to control them. The classification of the human population into different groups formsthe basic premise of using epidemiological models for modeling information diffusion. The two widely used such modelsare SIR (Susceptible, Infected, Recovered) and SIS (Susceptible, Infected, Susceptible) models. Newman et al. [14]showed that a large class of standard epidemiological models, viz. the SIR models, can be solved exactly on a widevariety of networks, and confirmed the correctness of solutions with numerical simulations of SIR epidemics on networks. Kimura et al. [8] proposed the application of the SISmodel to study information diffusion where the nodes canbe activated multiple times. Zhao et al. [23] proposed anSIHR (Spreaders, Ignorants, Hibernators, Removed) rumorspreading model, with forgetting and remembering mechanisms to simulate rumor spreading in inhomogeneous networks. Xiong et al. [20] proposed a diffusion model with fourdifferent states: susceptible, contacted, infected, and refractory (SCIR) to identify the threshold value of the spreadingrate approaches almost zero. Bettencourt et al. [1] proposedthe SEIZ (susceptible, exposed, infected, skeptic) model tocapture the adoption of Feynman diagrams by using the publication counts after World War II. They extract the generalfeatures for idea spreading and estimate the idea adoptionprocess. Their result showed that the SEIZ model can fit thelong term idea adoption process with reasonable error, butdoes not demonstrate whether this model can be applied onlarge scale datasets, or whether can be applied on Twitter,where the story unfolds in real-time.3.DATASETSWe focus on twitter datasets that have reliable coverage ofthe events being studied; the volume of tweets ranges fromas low as 791 to nearly three orders of magnitude greater.As described in Table 1, the news and rumors studied weredrawn from a variety of regions and across a diversity oftopics. Data collection was aimed at gathering tweets highlyrelated to the events under study. We employed customizedsets of keywords and hashtags pertaining to each incident.Finally, date range restrictions were used to define relevanttweets for each event. It is also pertinent to note that thetweets analyzed spanned a variety of languages: English,Spanish, Italian, and Portuguese.3.1News topicsBoston Marathon Bombings. Two pressure cooker bombsexploded near the finish line of 2013 Boston Marathon onApril 15, 14:49:12 local time, killing three people and injuring more than 264 others. The FBI released photographsand surveillance videos on online social networks which spreadlike wildfire and provided crucial leads for identifying thesuspects1 .Pope Resignation. Pope Benedict XVI announced his resignation on the morning of February 11, 2013. In nearly 6centuries, this was the first time a pope has stepped downfrom his office. This news received reactions from all acrossthe world2 .Amuay Refinery Explosion. Propane and butane gasleakage caused an explosion at the Amuay refinery in Venezuelaon August 25, 2012 1:11 am local time. The blast killed 48people, injured 151 others and damaged 1600 homes3 .Michelle Obama at the 2013 Oscars. In the 2013 Oscarawards ceremony, a big surprise was the appearance of USfirst lady Michelle Obama for presenting the ‘Best Picture’award4 at-the-oscars-announces-be st-picture-winner/

Table 1: Twitter datasets studied in this paper.Keywords & HashtagUSA#Tweets Responseratio50125968.3%Vatican3136556.75%Pope, (#)BenedictVenezuela4901562.89%Amuay, refinery, explosionentertainmentnewsUSA376254.45%Michelle Obama, OscarspoliticsrumorUSA79146.14%White House, %Doomsday, Mayan, el Castro, Dr. torcha Campesina, ama04-23-20136Doomsday78(a) Amuay explosion(b) Castro rumorFigure 1: Tweet volume.3.2RumorsObama injured. A fake associated press (AP) tweet originated on April 23, 2013 that President Obama was hurtin White House explosions which caused a brief period ofinstability in financial markets. The information was falseand it was determined that the Twitter account was hacked.Doomsday. December 21, 2012 was rumored to be theDoomsday as it marked the end date of a 5126 year longcycle in the Mesoamerican long count calendar. This rumorspread like wildfire and social networks were flooded withpanic and anxiety posts. Considering that we are still alive,Doomsday turned out to be nothing more than a rumor ona massive scale5 .Fidel Castro’s death. On October 16, 2012 a Naples doctor claimed that former Cuban leader, Fidel Castro suffereda cerebral hemorrhage and is near a neurovegetative state.However, on October 21, 2012, these rumors were denied byElias Jauva, former Venezuelan vice president, who releasedpictures of him meeting Castro a few days back6 .Riots and shooting in Mexico. A very interesting example that highlights the perils of rumor spreading on socialnetworks pertains to the false reports of violence and impending attack in Nezahualcoyotl, Mexico. (False) rumorsspreading on Twitter and Facebook about shootouts caused(real) panic and chaos in Mexico City on September 5, 2012.Interestingly, authorities themselves turned to Twitter todeny these rumors7 .3.35Preliminary cause-panic-in-mexico-city/6(a) Amuay explosionMarathon, (#)bostonmarathon(b) Castro rumorFigure 2: Followers/followees distributions. Followers: people who follow the person; Followees: peoplewho are followed by the person.We compare the basic properties of news and rumor propagation, by characterizing tweet volume over time, follower/followeedistributions, the ‘response ratio’ of a story, and the retweetcascades. In order to maintain brevity, we show results fromonly two stories in this section: one from our news collection(the Amuay explosion) and one from our rumor collection(Fidel Castro’s purported death).Tweet Volume. For both examples, we plot the tweetvolume over time from the beginning of the story. Figure 1(a) shows the activity for the 2012 Amuay refineryexplosion example. An activity burst was formed immediately after the news was made public. The number of tweetsdropped progressively as the days went by. This activitytrend displays attributes similar to breaking news propagation as described by Mendoza et al. [11]. In contrast, Figure 1(b) depicts the volume of tweets about a rumor regarding the health of the former Cuban leader Fidel Castro. Herewe see occasional spikes of tweet volume; note the increasein tweet volume around October 21st, when the rumors wereofficially denied.Followers and Followees Distributions. Figure 2(a)is a log-log scatter plot of the followers/followees distribution about the Amuay explosion news, and Figure 2(b) isthe corresponding plot about Fidel Castro’s death rumor.There is no significant qualitative or quantitative differencein this case; in particular both plots show that the numberof followees is less than the number of followers.Response Ratio. A tweet can either be a post madeby the user’s initiative, or a responsive post to some otheruser’s post (e.g., retweets and replies). As Starbird et al. [18]discuss, retweets reveal how information propagates througha social network: the ‘deeper’ a retweet, the more relevant

(a) Amuay Cascade(b) Castro CascadeFigure 3: Retweet cascade for the Amuay Explosion news and Castro rumor. Each node is a user id, andeach edge connects the retweet user to the original user.the tweet is for the community. Based on this idea, we definethe response ratio of a story as the fraction of responsivetweets to the total number of tweets in the story. Table 1lists the response ratio for all the 8 stories. As we can see,response ratios for news are higher than that for the rumors.Retweet Cascades. A retweet cascade reflects how thesocial media network propagates information. Figure 3 depicts the evolution of the retweet graphs for the Amuay newsand Castro rumor dataset. For Amuay news, we plot fourgraphs with intervals of 6 hours, depicting that a burst hasbeen formed during 6am-12am, only 5 hours after the accident. Fig. 3(b) shows the retweet graphs of the rumor forseveral days. We can see even after one day, there is no burstof tweets related to this rumor. Compared with the network between the news and rumors, we find several featuresabout the rumor. 1) The network for the news instance ismore complex and users can obtain news from many sources,while users obtain the rumor information only from limitedinformation centers. 2) There is an immediate burst after anews is made public while there is no obvious burst for therumors.use N (t) to denote the total population size, S(t) the susceptible population size, and I(t) the infected population size,such that N (t) I(t) S(t). As shown in Figure 4, the SISspreading rule can be summarized as follows:Figure 4: SIS model frameworkFigure 5: SEIZ model framework4.OUR APPROACHAs stated earlier, we used compartmental population models to quantify the propagation of news and rumors on Twitter, focusing primarily on the SIS and SEIZ models.4.1SISAs described earlier, this model divides the populationinto two compartments, or classes: susceptible and infected.Note that in this model, infected individuals return to thesusceptible class on recovery because the disease confers noimmunity against reinfection.In order to adapt this model for Twitter, we have givennew meaning to these terms. An individual is identified asinfected (I) if he posts a tweet about the topic of interest,and susceptible (S) if he has not. A consequence of this interpretation is that an individual posting a tweet is retainedto the infected compartment indefinitely; hence, he can notpropagate back to the susceptible class as is possible in anepidemiological application. At any given time period t, we An individual that tweets about a topic is regarded asinfected. A susceptible person has not tweeted about the topic. A susceptible person coming into contact with an infected individual (via a tweet) becomes infected himself, thus immediately posting a tweet. Susceptible individuals remain so until coming intocontact with an infected person.The SIS model is mathematically represented by the following system of ordinary differential equations (ODEs) [12]:d[S] βSI αIdtd[I] βSI αIdt(1a)(1b)

Table 2: Parameter definitions in SEIZ model[1]Parameter DefinitionβFigure 6: Numerical implementation work-flow.4.2SEIZOne drawback of the SIS model is that once a susceptible individual gets exposed to disease, he can only directlytransition to infected status. In fact, especially on Twitter, this assumption does not work well; people’s ideologiesare complex and when they are exposed to news or rumors,they may hold different views, take time to adopt an idea, oreven be skeptical to some facts. In this situation, they mightbe persuaded to propagate a story, or commence only aftercareful consideration themselves. Additionally, it is quiteconceivable that an individual can be exposed to a story(i.e. received a tweet), yet never post a tweet themselves.Based on this reasoning, we considered a more applicable,robust model, the SEIZ model which was first used to studythe adoption of Feynman diagrams [1]. In the context ofTwitter, the different compartments of the SEIZ model canbe viewed as follows: Susceptible (S) represents a user whohas not heard about the news yet; infected (I) denotes a userwho has tweeted about the news; skeptic (Z) is a user whohas heard about the news but chooses not to tweet about it;and exposed (E) represents a user who has received the newsvia a tweet but has taken some time, an exposure delay, priorto posting. We note that referring to the Z compartment asskeptics is in no way an implication of belief or skepticismof a news story or rumor. We adopt this terminology as thiswas the nomenclature used by the original

model is better at modeling rumor and news di usion than the traditional SIS model. We analyze eight representative stories (four true events and four rumors) across a range of topics (politics, ter-rorism, entertainment, and crime) and over several ge-ographic regions (USA, Mexico, Venezuela, Cuba, Vat-ican).

Related Documents:

Hindi News NDTV India 317 Hindi News TV9 Bharatvarsh 320 Hindi News News Nation 321 Hindi News INDIA NEWS NEW 322 Hindi News R Bharat 323. Hindi News News World India 324 Hindi News News 24 325 Hindi News Surya Samachar 328 Hindi News Sahara Samay 330 Hindi News Sahara Samay Rajasthan 332 . Nor

81 news nation news hindi 82 news 24 news hindi 83 ndtv india news hindi 84 khabar fast news hindi 85 khabrein abhi tak news hindi . 101 news x news english 102 cnn news english 103 bbc world news news english . 257 north east live news assamese 258 prag

Std 1997-2008 Brown spot epidemiological modeling. Avg 1997-2008 Std 1997-2008 Leaf blast epidemiological modeling. Another epidemiological example: Modelling . Target

Delaware State Epidemiological Outcomes Workgroup and the Purpose of the Epidemiological Profile . end if they run out of time or they tire of answering questions. . discrepancies in how CDHS reports some data points compared to how the Centers for Disease Control and Prevention (CDC)

laboratory and testing site capacity. To be objective-driven and sustainable, this framework outlines approaches to testing in geographical zones categorised according to four epidemiological contexts, referred to as 'Epidemiological Zones'. Testing approaches for each Epidemiological Zone, outlined below, focus on:

18 3. Cross-platform news consumption 23 4. News consumption via television 29 5. News consumption via radio 32 6. News consumption via newspapers 39 7. News consumption via social media 52 8. News consumption via websites or apps 61 9. News consumption via magazines 64 10. Multi-sourcing 68 11. Importance of sources and attitudes towards news .

119 news x english news channel 2 120 cnn english news channel 0.87 121 bbc world news english news channel 8 122 al jazeera english news channel 2 123 ndtv-24*7 english news channel 10 124 zee business english news channel 2.79 125 cnbc awaj hindi business news channel 2.62 126 cnb

News X UTV Bloomberg Aaj Tak STAR News NDTV India IBN 7 Zee News Sahara Samay News 24 India TV Live India News Express P7 News Newswire 18 Newzstreet TV Mumbai News ETV Marathi Saam Marathi IBN Lokmat, M’rathi STAR Majha Zee 24 Taas Manorama News India Vision AIR News . Title: Microsoft Wor