当前位置：文档库 › Circle based community detection

Circle based community detection

Circle Based Community Detection

Hemank Lamba

IBM Research,India helamba1@https://www.wendangku.net/doc/1f13203136.html, Ramasuri Narayanam IBM Research,India ramasurn@https://www.wendangku.net/doc/1f13203136.html,

ABSTRACT

The connection patterns among individuals or objects in complex(social)networks possess rich information that can be useful for conducting e?ecient network analysis.In par-ticular we consider the task of community detection in social networks.Nowadays social networking sites allow users to categorize their friends into di?erent lists.Some of the ex-amples being’circles’on Google+,’lists’on Twitter and Facebook.This information captures the reason of friend-ship between two users.In this paper we explore how this information can lead to detecting better community struc-tures.We pose this task as a community detection problem where the algorithm does not have the information about the underlying edges.Experiments are conducted on3real world social networks-Google+,Twitter and Facebook. The experiment shows that results obtained are better than the results obtained by community detection over the origi-nal graph.

Keywords

Social Networks,Community Detection

1.INTRODUCTION

The problem of community detection in social networks is not a recent one and has been studied in the past many number of https://www.wendangku.net/doc/1f13203136.html,munities,also called as clusters are a collection of nodes which inherently share some properties or have similar roles in the entire graph.For instance,a famous example and application of community detection is shown by Zachary’s karate club dataset,which is also used as baseline for evaluating community detection algorithms. The dataset consist of34vertices,each representing mem-ber of a karate club.Edges exist between individuals who interacted with each other outside the club.At some point in time,a con?ict arose between the club president and the instructor and that led to separation of the club into two groups.The aim is to be able to predict the two groups based on the original network structure.Broadly,the goal Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro?t or commercial advantage and that copies bear this notice and the full citation on the?rst page.Copyrights for components of this work owned by others than ACM must be honored.Abstracting with credit is permitted.To copy otherwise,or republish,to post on servers or to redistribute to lists,requires prior speci?c permission and/or a fee.Request permissions from Permissions@https://www.wendangku.net/doc/1f13203136.html,

ICARE2013New Delhi,India

Copyright2013ACM978-1-4503-2320-8...$15.00.of community detection is to maximize the intra-community similarity and minimize the inter-community similarity.A good community,thus,will be one where the number of edges going out of the community are much lesser than the number of edges staying within the community.A number of quality metrics have been proposed which can e?ciently try to capture this information,the most popular being mod-ularity.

Nowadays online social networks have allowed users to classify their friends into lists,thus allowing people to orga-nize their network.Examples being’Circles’on Google+ and’lists’on Facebook and Twitter.We will refer to all of them as’social circles’from now on.The social circles not only organize the network but also help users in customize content sharing.For instance,a user might want to share his holiday pictures with his’Friends’and’Family’but not with o?ce colleagues.A model example is shown in Figure 1.In the?gure we can see that a node u has4social cir-cles in it’s ego network,which are labeled as Family,High School Friends,Undergraduate Friends and O?ce Friends. Though in this?gure,no overlapping circle has been shown, but it can be the case that there are overlapping social cir-cles.The network shown in the?gure is an ego network, where each node in the network has an edge to node u.The ego network is also divided into clusters which are labeled by the user himself.The labels generally are the reasons why a particular user is his friend or unique attributes which di?erentiate it from other friends.

Our hypothesis is that even in the absence of edges,the social circle information and it’s labels can be used to?nd community structures which are better than the community structures found by using just the network information.Our approach takes social circle of a particular user and its labels as input and produces a community structure over the en-tire network.We also show in the paper how this approach can be generalized to situations where there is no network information.To the best of our knowledge,this is the?rst attempt in the literature to use social circles for community detection over the entire network.

2.RELATED WORK

Community detection has for long been the source of at-tention of network science researchers.Ever since the semi-nal paper by Girvan Newmann[9],lot of work and interest has been generated in this?eld.The set of algorithms that can be used to?nd communities can be divided on the basis of approach into?ve categories[8]-(i)Graph Partitioning (ii)Hierarchical Clustering(iii)Partitional Clustering(iv)

Figure1:An example of social circle. Spectral Clustering(v)Divisive Algorithms and(vi)Mod-ularity based methods.For this particular piece of work, we will concentrate only on modularity based methods as we have used modularity for evaluation of the mentioned approaches.Modularity[15]is one of the best popular qual-ity functions for community detection nowadays.Optimiz-ing modularity is proved to be a NP-Complete problem[4]. Much of the work in the domain of modularity maximizing community detection has been in trying to create algorithms for optimizing modularity and to decrease the run time of such algorithms.The?rst algorithm proposed to optimize modularity was given by[9],and was based on removing edges which give the maximum increase of modularity with respect to previous con?guration.[5]optimized the algo-rithm by maintaining e?cient data structures.A lot of work followed increasing modularity by local search optimization [2],stochastic perturbations of partitions[13].A completely di?erent approach was proposed by Blondel et al.[3],it was based on vertex picking greedy,a particular node was put into that community of its neighbors which gave the maxi-mum modularity gain.This approach not only gave higher modularity value than[5]but also decreased the run time of the algorithm.On the top of this various other optimiza-tions-Simulated Annealing[10],Extremal Optimization[7], Spectral Optimization[15][14],and mathematical program-ming[1]have been proposed.Each of the approach has its own sets of pros and cons.However,none of the above mentioned approaches1,use only social circles to?nd com-munities.

Probably the closest work is done by Pan et Al.[16],who presented an agglomerative community detection method based on node similarity.The method takes into account structural similarity and clusters nodes based on the sim-ilarity value.However,we infer the edges based on the social circle labels and generate edge weights on that ba-sis and optimize modularity by using these edge weights. The edge weights represent the probability that the pair of nodes should be in the same cluster or not.Another rele-vant work is by Jaho et Al.[11]where the authors present ISCoDe,which uses text/activity of nodes in the network to generate edge weights based on their similarity and then does a weighted community detection.Both of these works require edge information of the entire network and also sup-plementary text data or activity log.Whereas our approach requires just the labeled social circle information of the ego 1See Fortunato[8]for comprehensive survey network of a particular user.

Previously,social circles have been used only once in the literature[12].The authors using the features of the social circles and the members in it,tackled the novel problem of automatically identifying users’social circles.Besides that,

to the best of our knowledge,the functionality of social net-working sites has not been used to solve any research prob-lem.

3.OUR APPROACH

Our approach is divided into two parts.The?rst part in-volves inducing edge and edge weights over the pair of nodes

in the set.The second part is obtaining community structure using the induced subgraph so created.In the?rst part of

our approach,we look at all the nodes for which we have ego networks and the social circle labels related with it.Let the

U be the entire set of users,and let U E be the set of nodes for which we have ego network and corresponding social circle labels.Consider node u∈U E and let the set of social cir-

cles it has be denoted by C u where C u={C1u,C2u,...,C ku} such that?i∈{1,2,..,k},C iu?V u where V u is the set of nodes in the ego network of node u.For the task of commu-

nity detection we consider the entire set of nodes V where

u∈U E

V u.

For each pair of nodes,we want to induce edge,and an edge weight which indicates with what probability should

they lie in the same community.Our approach assumes

that given a pair of nodes are going to lie in the same com-munity only if they have been assigned in the same social circle by di?erent nodes high number of times.For example, Ross labelled Chandler as his high school friend and also la-belled Monica as his high school friend.Rachel also labelled Chandler and Monica as her apartment friends.Thus,it is

with high probability that in the entire network Monica and Chandler should be a part of the same community.

For each node v∈V\U E,we have a set M v={M1v,M2v...M kv} where M iv is the unique circle id to which it was assigned

by a node u∈U E.Set M v thus just denotes the set of circle

ids to which the node v was a part of.We model the proba-

bility that a pair of nodes(x,y)∈V×V form an edge with weight as

p(x,y)=

M x∩M y

M x∪M y

which is Jaccard coe?cient over the set of social circles to which each of the node x and y has been assigned.Only edges which have an edge weight(probability)greater than some thresholdθare considered for community detection. Over this induced graph,we run community detection algo-rithm which aims to maximize modularity.For our exper-iments,we have considered the BGLL algorithm[3],which

is a hierarchical agglomerative approach to maximize mod-ularity for large networks.To evaluate the quality of the community structure,we use modularity as the quality met-

ric.Modularity is given as follows:

(A ij?

k i k j

)s i s j+1

where m is total number of edges in the network,A is the adjacency metric and A ij=1if there is an edge between i

and j else it is0,k i and k j are the degrees of node i and node j respectively.For a division into2communities,s i is

Table

1:Datasets

Name Nodes Edges U E

Facebook 4,03988,23410Google+107,61413,673,453132Twitter 81,3061,768,149

973

Figure 2:Probability Value Distribution set to 1if node i belongs to community 1,else if it belongs

to community 2it is -1.

For baseline evaluation,we run the community detection algorithm on the original network that is obtained by joining all the edges in all the ego networks available in the dataset.

4.DATASETS AND RESULTS

We test our approach on set of 3real world social network datasets -Google+,Facebook and Twitter.These datasets have earlier been used by Leskovec et.Al [12].The data for Facebook was collected using a Facebook app,where people logged in and classi?ed each of their friends into a list.For Google+and Twitter,the data was collected by using their API.A brief description of each of the dataset is given in Table 1.In each of the dataset,social circle information is available only for limited number of users.For our analysis,the network is created by merging the ego-networks that are available in the dataset.While merging the ego-networks,the ego node and the edges incident on it are removed.Similarly for the network created from the so-cial circle information,the ego nodes and the edges incident on them are removed.For social circle based community detection,the theta is set to 0.So,any edge with a posi-tive non-zero weight is taken into the induced graph when running community detection algorithm.

Figure 2displays the probability value distribution.The distribution in both the cases of Facebook and Twitter in-dicate many edges having a strong probability.This can be because of the sparseness of the graph that each node may be coming only in 1circle,thus its probability value with every other node in its own circle who is also member of just 1circle will be 1.For facebook,very few edge probability values lie beside the value 1whereas in the case of Twit-ter dataset,the

number of edges with non-1value is much

Table 2:Results

Name Original Circle Based Facebook 0.7855320.848908Google+0.4139560.850151Twitter 0.824990.964483

Figure 3:E?ect of Varying Theta on Modularity -Facebook network.

higher.

Table 2shows the results of the experiment conducted.For each of the three datasets,our approach gives a higher modularity value than the original network.In Google Plus network,the circle based approach gives a modularity score which is nearly double of the original score.For Twitter and Facebook as well,the modularity score is higher than the original graph.

Figure 3shows the e?ect of varying threshold theta on modularity.As the θincreases,the number of edge decreases and only the edges with higher probability are retained in the induced graph,thus reducing the total amount of infor-mation contained by the edges in the network.The baseline that is the modularity score of the original community struc-ture is shown in the same ?gure by pink line.Though with high value of θ,the number of edges reduces and the infor-mation of the overall network reduces but still there is a gain in the modularity.This might be due to the reason that the edges that remain in such network after ?ltering out are the edges which have high importance as they are the ones with higher probability values.

5.CONCLUSIONS

In this paper,we presented a way to learn e?ecient com-munity structures using just the social circle information which a limited number of users have https://www.wendangku.net/doc/1f13203136.html,ing mod-ularity as a quality function for evaluating communities,we showed our approach to be performing better than the base-line approach.For social networks like Facebook,Twitter or Google Plus,e?cient community structure can be obtained by just using the social circle information,which captures the

6.FUTURE WORK

This was the ?rst work in the direction of using social

circles(local)to?nd network level community structures (global).The algorithm proposed did not require the edge information as it tried to learn the probability2nodes should be in the same community using social circle information. However,it is not necessary to learn the probability values from the social circle information.The probability values can also be learnt from other sources.One particular direc-tion is the spread of information cascades.This lies at the intersection of community detection algorithms and informa-tion di?usion.It is very vital as it has also been mentioned by Easley and Kleinberg in their book[6]page577:

”cascades and clusters truly are natural opposites:clusters block the spread of cascades and whenever a cascade comes to stop,there’s a cluster that can be used to explain why”. Information di?usion traces are usually given in tuples iden-tifying the cascade c,node u and the time t at which the node got infected with the cascade.The time di?erence be-tween2consecutive can be used to generate the probability over the edges.

Another possible extension is to study the e?ectiveness of such an approach over other community detection quality metrics like conductance,triangle participation ratio(TPR), Cut Ratio,Flake out degree fraction etc.This work also has applications in distributed social networks where each node has information only about its neighbors.The data when aggregated can be used to?nd e?ecient community struc-tures using only the local information that is the information that is readily available with each node.

7.REFERENCES

[1]G.Aggarwal and D.Kempe.Modularity-maximizing

graph communities via mathematical programming.

European Physics Journal B,66(3),2008.

[2]A.Noack and R.Rotta.Multi-level algorithms for

modularity clustering.In In Proceedings of SEA,2009, 2009.

[3]V.Blondel,J.Guillaume,https://www.wendangku.net/doc/1f13203136.html,mbiotte,and

E.Lefebvre.Fast unfolding of communities in large

networks.Journal of Statistical Mechanics:Theory

and Experiment,10,2008.

[4]U.Brandes,D.Delling,D.Gaertler,M.Gorke,et al.

On?nding graph clusterings with maximum

modularity.In International Workshop on

Graph-theoretic concepts in Computer Science,LNCS, pages121–132,2007.

[5]A.Clauset,M.Newmann,and C.Moore.Finding

community structure in very large scale networks.

Phys.Rev.E.,70(066111),2004.

[6]D.Easley and https://www.wendangku.net/doc/1f13203136.html,works,crowds and

markets:Reasoning about a highly connected world.

2008.

[7]J.Duch and https://www.wendangku.net/doc/1f13203136.html,munity identi?cation

using extremal optimization.Phys.Rev.,72(027104),

2005.

[8]https://www.wendangku.net/doc/1f13203136.html,munity detection in graphs.Physics

Reports,486(3-5),2010.

[9]M.Girvan and https://www.wendangku.net/doc/1f13203136.html,munity structure in

social and biological networks.Proceedings of the

National Academy of Sciences,99(12),2002.

[10]R.Guimera,M.Sales-Pardo,and L.Amaral.

Modularity from?uctuations in random graphs and

complex networks.Phys.Rev.,70(0215101),2004.[11]E.Jaho,M.Karaliopoulos,and I.Stavrakakis.Iscode:

a framework for interest similarity-based community

detection in social networks.In Workshop on Network Science for Communication Networks(NetSciCom),

2011.

[12]J.McAuley and J.Leskovec.Learning to discover

social circles in ego networks.In Neural Information

Processing Systems(NIPS),2012.

[13]J.Mei,S.He,G.Shi,Z.Whang,and W.Li.New

Journal of Physics,14(043025),2009.

[14]M.Newmann.Finding community structure in

networks using the eigenvectors of matrices.Phys

Rev.,74,2006.

[15]M.Newmann.Modularity and community structure in

networks.PNAS,103(23),2006.

[16]Y.Pan,D.-H.Li,J.-G.Liu,and J.-Z.Liang.Detecting

community structure in complex networks via node

similarity.Physica A:Statistical Mechanics and its

Applications,389(14):2849–2857,2010.