On Joint Diagonalisation For Dynamic Network Analysis

2y ago
12 Views
2 Downloads
820.90 KB
12 Pages
Last View : 1m ago
Last Download : 3m ago
Upload by : Kian Swinton
Transcription

Technical ReportUCAM-CL-TR-806ISSN 1476-2986Number 806Computer LaboratoryOn joint diagonalisationfor dynamic network analysisDamien Fay, Jérôme Kunegis, Eiko YonekiOctober 201115 JJ Thomson AvenueCambridge CB3 0FDUnited Kingdomphone 44 1223 763500http://www.cl.cam.ac.uk/

c 2011 Damien Fay, Jérôme Kunegis, Eiko YonekiTechnical reports published by the University of CambridgeComputer Laboratory are freely available via the Internet:http://www.cl.cam.ac.uk/techreports/ISSN 1476-2986

On joint diagonalisationfor dynamic network analysis#1Damien Fay, Jérôme Kunegis &2 , Eiko Yoneki 3#University of Cork, IrelandUniversity of Koblenz–Landau, Germany University of Cambridge, United z.de3eiko.yoneki@cl.cam.ac.uk2Abstract— Joint diagonalisation (JD) is a techniqueused to estimate an average eigenspace of a set ofmatrices. Whilst it has been used successfully inmany areas to track the evolution of systems via theireigenvectors; its application in network analysis is novel.The key focus in this paper is the use of JD on matricesof spanning trees of a network. This is especially usefulin the case of real-world contact networks in which asingle underlying static graph does not exist. The averageeigenspace may be used to construct a graph whichrepresents the ‘average spanning tree’ of the network ora representation of the most common propagation paths.We then examine the distribution of deviations fromthe average and find that this distribution in real-worldcontact networks is multi-modal; thus indicating severalmodes in the underlying network. These modes areidentified and are found to correspond to particulartimes. Thus JD may be used to decompose the behaviour,in time, of contact networks and produce average staticgraphs for each time. This may be viewed as a mixturebetween a dynamic and static graph approach to contactnetwork analysis.Keywords. Social networks, joint diagonalisation,graph analysis, spanning tree, human contact networksI. I NTRODUCTIONUnderstanding the dynamic structure of contact networks is critical for designing dynamic routing algorithms [11], epidemic spreading [16] and messagepassing algorithms [10].Time dependent networks are characterised by timedependant paths which are characterised by the orderin which the paths occur. For example, a path between3 nodes A ! B ! C does not imply a reverse pathexists; A ! B ! C provides no information abouthow C may communicate with A. However, in manyapplications a static graph is constructed which represents typically the proportion of time a link was seenbetween two nodes. These static graphs often lose thetime information which is critical in contact networks.However, at a specific time and from a specific nodethere is a single static network representing the pathsbetween the root node and the rest of the network.JD is used in this paper to look for commonalitiesin these specific static networks and to provide anaverage static network where appropriate. That is, weare looking for modes of operation in contact networks.Joint Diagonalisation (JD) is a technique that is usedto track the changes in eigenspace (i.e. eigenvectorsand eigenvalues) of a system (see Section III for examples). Eigenvectors and eigenvalues play an importantrole in static network/graph analysis as they can beused to determine the centrality of nodes; communitiesand settling times among other things [1]. However, tothe best of our knowledge tracking eigenspace evolution has not been applied to contact/time dependentnetworks previously. This paper examines the use ofJD in network analysis.II. R ELATED WORKJoint diagonalisation has been used in many applications where the evolution of a system can be trackedsmoothly via its eigenspace. For example, Macagnanoet al. [12] present an algorithm for localisation ofmultiple objects given partial location information.As time evolves the location of the objects changessmoothly which may be seen through the evolution ofthe eigenvectors of a distance matrix. Other examplesinclude blind beam forming [4] and blind source separation [20].Sun et al. [19] use tensor analysis to examine timedependent networks. A tensor is multi-dimensionalmatrix (for example a set of adjacency matrices) whichare essentially reduced using PCA to a core tensor.This technique is similar in spirit to that presentedhere, the difference is that we are looking to reduce aset of spanning trees representing propagation througha network; propagation information being preserved.Scellato et al. [18] examine the different characteristics of contact networks as they evolve over time.However, the analysis there is based on forming staticgraphs by amalgamating all links seen in an intervalof time. This may introduce connections which in

Fig. 1. A simple graph and its 6 spanning trees. (The numbers represent the root nodes and probability of observing the tree ex: 1,6,2,3are the root nodes for the third tree and this tree is observed with probability 4/14)where Ai,j is element i, j of the (possibly weighted)adjacency matrix and λ is the largest eigenvalue ofAi,j as can be seen by rewriting Equation 1 in matrixnotation as:1X AX(2)λfact are unordered. Graph measures (e.g. the clusteringcoefficient) are then measured from these graphs and atime series analysis of these follows. In contrast, herethe amalgamation is dependent of the contact networkitself.Riolo et al. [16] investigate time dependent epidemicnetworks with a view to constructing transmissiongraphs, directed graphs which indicate the directionof transmission of a disease through a network. Whilethe aim in this paper is similar, the methodology usedis significantly different as they examine one timeinfections in real networks.The eigenvector corresponding to λmax gives theeigenvector centrality of node i.Given M samples of a network, H1 . . . HM , thequestion now arises; how can these be combined togive a matrix that reflects the sampling bias. Wepropose using a method known as joint diagonalisationwhich produces an average eigenspace of the samples.Specifically, we seek an orthogonal matrix such that:III. T HEORETICAL BACKGROUNDWe begin by defining snowball sampling whichconsists of selecting a root node randomly in the network with uniform probability and performing a BreathFirst Search (BFS) from this node (i.e. determining aset of shortest paths from the source node to everyother node in the network). This produces a spanningtree, H, where H is a subset of the original graphG(V, E), where V and E denote the vertex and edgesets respectively, and jV j N denotes the number ofnodes. We call the starting node the observer or rootand H, the sample. Figure 1 shows a simple graphwhich will be used for demonstration purposes. In thefirst sample node 5 is selected at random and a shortestpath first search results in the first tree in Figure 1. Inthis simple graph there are 6 spanning trees shown inFigure 1. Note that the distribution of spanning treesin this network are not uniform but biased. That is,traffic generated uniformly from each node results innon-uniform percolation across the network.Next we develop a centrality measure which isbased on standard eigenvector centrality. Eigenvectorcentrality [14] is defined by letting the centrality ofnode i equal the average of the centrality of all nodesconnected to it:N1XAi,j xj(1)xi λ j 1Hi U Ci U T 8i(3)If U corresponds to the eigenvectors of Hi then Ciis diagonal however no matrix U exists in which allCi are diagonal (except for the trivial case in whichall Hi are equal). Joint diagonalisation seeks averageeigenvectors Ū such that the sum of squares of the offdiagonal elements of Ci are minimised. Specifically:Ū argmin off 2 (UMXCi )(4)j 1where off 2 is the sum of the off diagonal elementssquared, called the deviation of Hi from H̄, δi :X k,jδi off 2 (Ci ) jCi j2(5)k6 jCik,jwhereis the k th row and j th column of Ci . Asshown in [21] and [3] Equation 4 may be minimisedefficiently by a sequence of Givens rotations; convergence and stability properties are proven in [2].Given the average eigenstructure of the sample matrices an average sampling matrix may be constructedfrom the eigenvector decomposition as:H̄ U C̄U T4(6)

0.010.01 0.03 0.051.000.790.010.01 0.010.790.010.010.030.011.000.010.01Fig. 2.The average sampling graph corresponding to H̄.Fig. 4. H̄ for example 2 (Note: the red box contains the nodenumber and the sampling centrality).Where H̄ is a matrix in which the entries representthe average weight of the links as observed by thesamples in the network (in a least squares sense) andC̄ is the average of diagonals of Hi projected onto Ū ;i.e. the average eigenvalues. A sample based centralitymay then be constructed from H̄ using the standardeigenvector centrality; i.e. by using the eigenvector ofH̄ corresponding to the maximum eigenvalue.It is instructive to view the eigenvector reconstruction of H̄ which is formed from the eigenvectors inTable I via Equation 6. H̄ is drawn in Figure 2. Notethat H̄ is a complete weighted graph; weights assignedto non-existent links are low and are a consequence oftaking an average of many graphs. The edge betweennodes 3 and 4 has a weight of 0.7857 1114 , i.e. theproportion of trees that use that link (see Figure 1).H̄ represents the sample biased weight of this link; anice result. There are two reasons why these numbersare not exactly the same; the first is that, as always,there is a slight error introduced when using empiricalsampling. The second is that the H̄ is based on theaverage eigenspace of a set of sampling trees; this isnot the same as simply taking the proportion of timesa link has been observed.A. Simple examples.Using the graph shown in Figure 1, 100 samplesare taken by randomly choosing a root node andconstructing a spanning tree from each. These arethen jointly diagonalised producing the eigenvectorsshown in Table I. The sampling centrality is the firsteigenvector (column 1).TABLE IT HE AVERAGE EIGENVECTOR , U FROM THE GRAPH IN F IGURE 0.58764289531Fig. 5.10Fig. 3.H̄ for example 2 with a preference for routes 6 4.The second example deals with a preferred route.Figure 3 shows a graph with a bridge formed fromThe simple graph used in example 2.5

by doing Bluetooth device discovery every fiveminutes. 1 month of data is used here to maintainthe consistency of users.The three experiments are summarised in Table IIand the trace data can be downloaded at CRAWDADdatabase [5].nodes 6 4 and 5 3 ; these nodes are critical injoining the two parts of the graph.Figure 4 shows the weighted graph created by sampling as above1 . The second stage involves creatinga preferred route by removing 80% of trees that usethe link between 5 3. The resultant average graph isshown in Figure 5. As can be seen the weight attachedto link 5 3 is greatly reduced (from 0.9 to 0.2).Thus far we have only dealt with samples takenon static graphs. However, joint diagonalisation isparticularly well suited to contact networks. In thesenetworks there is no underlying static graph as such,but rather a set of contacts that are time dependent. Byflooding these networks spanning trees may be formedand combined by the use of JD. The next section detailsthese real-world data sets.V. R ESULTSThe results below examine the 3 data sets separatelyhighlighting the features of each. Finally a syntheticcontact network with known characteristics is constructed and JD used to extract these characteristics.31221513253273628123427IV. DATA SET DETAILS1129213816In this paper, we use four experimental datasetsgathered by the Haggle Project [7], referred to as Cambridge, Infocom06; one dataset from the MIT RealityMining Project [6], referred to as MIT. Previously, thecharacteristics of these datasets such as inter-contactand contact distribution have been explored in severalstudies [9], to which we refer the reader for furtherbackground information. These three datasets cover arich diversity of environments, ranging from a quietuniversity town (Cambridge), with an experimentalperiod from a few days (Infocom06) to one month(MIT).Experimental data setDeviceNetwork typeDuration (days)Granularity (seconds)Number of DevicesNumber of contactsAverage # 5 143024Fig. 6.A typical flooding tree; as seen from node 20.Next the distribution of the sample start times isexamined Figure 9. As can be seen the 5 modescorrespond to different times in the data set. Modes1 and 5 cover the first half of the data while modes 2then 3 and then 4 become dominant in that succession.MITPhoneBluetooth2463009754,6670.024A. Cambridge dataFigure 6 shows a typical sampling tree for theCambridge data set. Node 20 initiates a message andit is passed around the contact network; first to nodesIn Cambridge, the iMotes were distributed mainlyto two groups of students from University ofCambridge Computer Laboratory, specifically undergraduate year1 and year2 students, and alsosome PhD and Masters students. This datasetcovers 11 days.In Infocom06, the trace contains 78 participants.Among 78 participants, 34 form 4 subgroups byacademic affiliations.In MIT, 100 smart phones were deployed to students and staff at MIT over a period of 9 months.These phones were running software that loggedcontacts with other Bluetooth enabled devices1 Links23926C HARACTERISTICS OF EXPERIMENTAL DATA SETS 235101961817TABLE II 33465.554.543.532.521.524 2614 30 1 4 10 5 19 15 6 20 171823 9 33 2 35 3 1636 2521 7 8 11 2927 1234 2832 1322 31Fig. 7.set).with low weights are removed for clarity.6Community based on Fiedler clustering (Cambridge data

0.0840AllMode 1Mode 2Mode 3Mode 4Mode 3150.02100.015Node 3Node 20Other nodes010203040Σ off xi 250607080 310001500Fig. 11. Mean number of nodes susceptible to disease after time t(Root infection starts at time 250. Inset focuses on the start of theinfection.).AllMode 1Mode 2Mode 3Mode 4Mode 51.81.61.41.2f(t)500Group5x 1010.80.60.40.20 1000tFig. 8. Distribution of δi (Cambridge data set; kernel smoothingis employed for the overall average.).200100Fig. 9.200300400Time500600700800900Distribution of times by mode.16 and 6 and from there to the rest of the network. Forthis experiment ten thousand such trees are generated2with the messages starting at a random times and froma random node (uniformly distributed). These are thencombined using Joint Diagonalization to form H̄.The average graph, H̄, for this data set is representedin Figure 10(a). This representation shows all links inthe weighted shortest paths of H̄ 3 . As can be seen thenodes split into two groups as expected. These groupsmay be represented by a standard dendrogram based onFiedler vector clustering [8] as shown in Figure 7. Thegroups seen here correspond closely to those found onthe same data set in [23].The results so far have examined the average behaviour of the contact network which is interesting initself. However, by examining the distribution of deviations, δi , from the average a more interesting behaviourmay be observed. Figure 8 shows the distribution ofδi , i 1 . . . 10, 000. As can be seen the distribution ismulti-modal; i.e. the underlying process/contact network has different modes of operation. A Gaussianmixture model [13] is used to determine the differentmodes as shown in Figure 8. 5 different modes areidentified.This is particularly useful as it allowsthe network to be characterized by different modes ofbehaviour at different times. H̄ for the overall data setand for each mode are shown in Figure 10. Mode oneshows a highly structured network corresponding to theday when the groups are well defined by class year (i.e.year 1 and year 2). This structure then becomes lesswell defined as time moves on. Mode 5 is particularlyinteresting as there is an obvious bridge formed bynodes 3 and 20. This mode covers the night time and830287135345848512577027738975 531056725514 37 96415766405024 7623917861683 8890802059484113287 31 822577696126446349542 A large sample size is used here to negate random effects.However, similar results are found for much smaller sample sizes.3 We found this to be the clearest means of representing a completeweighted graphs.5Fig. 12.762388158653214652798647673311 4368427421514341819362229603917Graph of shortest paths in H̄; overall. (MIT).

Fig. 10. Graph of shortest paths in H̄ for overall and 5 modes. (Cambridge data set; the size of a node is proportional to the sum ofweights incident on that node)0.06AllMode 1Mode 120Σ off xi 21251301351401451.5Distribution of δi (MIT).Fig. 501686286638640689 4Group2x 10Fig. 13.Community based on Fiedler clustering (MIT data set).AllMode 1Mode 23.53nodes 3 and 20 are possibly staff who interact with thestudents in the morning4 . Using H̄ as an indicator, thisimplies that a disease spread at this time from nodes3 and 20 should have the fastest infection rate. Notethat mode 1 is still dominant in this period; this modeis essentially being suspended overnight (due to fewcontacts) with the spanning trees being completed inthe morning.To test the infection rate, an SIR model is constructed5 and a disease is spread through the contactnetwork starting at time index 250. The simulationis repeated 30 times for each node and the resultsbootstrapped to give estimates of the mean number ofpeople susceptible (i.e. those that have not received thedisease) at time, t, S(t). Figure 11 shows the resultsof these simulations and as can be seen the TimeFig. 15.Distribution of times by mode. (MIT)of susceptible people falls most rapidly for infectionsstarted at nodes 3 and 20, as expected.B. MIT dataThe results from the MIT data set show a differenttype of behaviour. There are two main groups (Figure 12) the largest of which can be further subdivided4 Wecannot be sure as the data has been anonymised.of infection 0.5; infection time Poisson distributedwith mean 80 time steps, 800 mins.5 Probability8

Group40.040.012AllMode 1Mode 2Mode 3Mode 40.035AllMode 1Mode 2Mode 3Mode 020.00502040Fig. 16.6080100Σ off xi 21201401600180050100150200250TimeInfocom06 results. (a) Distribution of δi . (b) Distribution of times by mode. (c-f) H̄ for each mode.into three smaller groups (Figure 13).The distribution of δi is shown in Figure 14 andtwo main modes are identified from this. The MITdata set spans a month of data and recurring patternsemerge from the data as shown in Figure 15. Thisis particularly interesting as it introduces the conceptof being able to forecast the behaviour of a group atregular intervals and design strategies for those specificmodes.ence. Four modes are identified (Figure 16(a)) whichcorrespond to four periods in time (Figure 16(b)). Thefirst mode to occur is mode 3 showing much mixingbetween the delegates (Figure 16(e)). This is probablythe delegates meeting for coffee before the conferencebegins. This is then followed by two periods of structured graphs (i.e. presentations; Figures 16(c,d)) endingwith a period of mixing (Figures 16(f)).D. Synthetic contact networkC. INFOCOM ’06 dataThe first network created in this section is a purelyrandom contact network in which 5% of 50 nodesare connected at random in each time step. Figure 18The Infocom data is summarised in Figure 16 andfollows the behaviour typically expected at a confer9

05Underlying graphH310443715162

Technical Report Number 806 Computer Laboratory UCAM-CL-TR-806 ISSN 1476-2986 On joint diagonalisation for dynamic network ana

Related Documents:

Bruksanvisning för bilstereo . Bruksanvisning for bilstereo . Instrukcja obsługi samochodowego odtwarzacza stereo . Operating Instructions for Car Stereo . 610-104 . SV . Bruksanvisning i original

10 tips och tricks för att lyckas med ert sap-projekt 20 SAPSANYTT 2/2015 De flesta projektledare känner säkert till Cobb’s paradox. Martin Cobb verkade som CIO för sekretariatet för Treasury Board of Canada 1995 då han ställde frågan

service i Norge och Finland drivs inom ramen för ett enskilt företag (NRK. 1 och Yleisradio), fin ns det i Sverige tre: Ett för tv (Sveriges Television , SVT ), ett för radio (Sveriges Radio , SR ) och ett för utbildnings program (Sveriges Utbildningsradio, UR, vilket till följd av sin begränsade storlek inte återfinns bland de 25 största

Hotell För hotell anges de tre klasserna A/B, C och D. Det betyder att den "normala" standarden C är acceptabel men att motiven för en högre standard är starka. Ljudklass C motsvarar de tidigare normkraven för hotell, ljudklass A/B motsvarar kraven för moderna hotell med hög standard och ljudklass D kan användas vid

LÄS NOGGRANT FÖLJANDE VILLKOR FÖR APPLE DEVELOPER PROGRAM LICENCE . Apple Developer Program License Agreement Syfte Du vill använda Apple-mjukvara (enligt definitionen nedan) för att utveckla en eller flera Applikationer (enligt definitionen nedan) för Apple-märkta produkter. . Applikationer som utvecklas för iOS-produkter, Apple .

Eigenvectors and diagonalisation Inner products and orthogonality Wrapping up Radboud University Nijmegen Last

Xu : A Diagonalisation Algorithm and Its Applications in Ambiguity Search 37 1 1 1 11 1 1 1 11 P E J P PA M A P P A M A P P P E J E J P T T T T T i.e., matrice

3 Predicate Logic 4 Theorem Proving, Description Logics and Logic Programming 5 Search Methods 6 CommonKADS 7 Problem Solving Methods 8 Planning 9 Agents 10 Rule Learning 11 Inductive Logic Programming 12 Formal Concept Analysis 13 Neural Networks 14 Semantic Web and Exam Preparation . www.sti-innsbruck.at Agenda Motivation Technical Solution – Introduction to Theorem Proving .