文档库 最新最全的文档下载
当前位置:文档库 › journal.pone.0120735

journal.pone.0120735

RESEARCH ARTICLE

Temporal Effects in Trend Prediction:

Identifying the Most Popular Nodes in the

Future

Yanbo Zhou 1,An Zeng 2,Wei-Hong Wang 1*

1College of Computer Science and Technology,Zhejiang University of Technology,Hangzhou,P.R.China,

2School of Systems Science,Beijing Normal University,Beijing,P.R.China

*wwh@https://www.wendangku.net/doc/1c5970279.html,

Abstract

Prediction is an important problem in different science domains.In this paper,we focus on

trend prediction in complex networks,i.e.to identify the most popular nodes in the future.

Due to the preferential attachment mechanism in real systems,nodes ’recent degree and

cumulative degree have been successfully applied to design trend prediction methods.

Here we took into account more detailed information about the network evolution and pro-

posed a temporal-based predictor (TBP).The TBP predicts the future trend by the node

strength in the weighted network with the link weight equal to its exponential aging.Three

data sets with time information are used to test the performance of the new method.We find

that TBP have high general accuracy in predicting the future most popular nodes.More im-

portantly,it can identify many potential objects with low popularity in the past but high popu-

larity in the future.The effect of the decay speed in the exponential aging on the results is

discussed in detail.Introduction The emergence of online social media and rich user-generated content bring the information overload problem.The online content has become increasingly abundant and immediately available,and users cannot go through every piece of information to find the high quality ones [1].These high quality source,in general,have formidable power to impact opinions,culture,and policy,as well as advertising profit.Thus they usually attract a lot of attention and become eventually popular.The rapid development of the Internet results in the availability of a huge amount of data with time information,making it possible to study the popularity dynamic of the online content.It was found that the popularity of various pieces of content on the Web,like news [2],Twitter [3],blog posts [4],videos [5,6],posts in online discussion forums [7]and product reviews [8],vary significantly on temporal scales.In this context,the early identifi-cation of the eventual popular content becomes an important problem [9].It cannot only im-prove the user experience as the prediction save their searching time for high quality

contents

OPEN ACCESS Citation:Zhou Y,Zeng A,Wang W-H (2015)Temporal Effects in Trend Prediction:Identifying the Most Popular Nodes in the Future.PLoS ONE 10(3):e0120735.doi:10.1371/journal.pone.0120735Academic Editor:Xia Li,Harbin Medical University,

CHINA

Received:October 18,2014

Accepted:February 6,2015

Published:March 25,2015

Copyright:?2015Zhou et al.This is an open

access article distributed under the terms of the

Creative Commons Attribution License ,which permits

unrestricted use,distribution,and reproduction in any

medium,provided the original author and source are

credited.

Data Availability Statement:All relevant data are

within the paper and its Supporting Information files.

Funding:This paper is supported by the National

Natural Science Foundation of China (61340058)and

the Natural Science Foundation of Zhejiang Province

(LZ14F020001).

Competing Interests:The authors have declared

that no competing interests exist.

among unpopular ones,but also bring commercial profit for the online vendor as it helps them better manage their inventory.

Although the preferential attachment(PA)is a success in explaining the power-law distribu-tion that widely found in real systems[10],it performs not satisfying enough when applied to predict the future popularity(or degree)of nodes.For example,it was found in the citation net-works that some papers can attract significantly more citations than the prediction from PA [11,12].The ability of a node to attract new links is found to decay exponentially with time, both in citation networks[13]and information access[14].Moreover,temporal analysis of the popularity dynamics of online content in Wikipedia[15]and micro blog[16]shows the burst pattern.An extensive study of how the content’s popularity grows and fades over time in online media has been presented in ref.[17].Besides the experimental study of the temporal dynamic, some possible mechanisms that may contribute to the experimental finding are proposed,such as relevance and time decay[13],random popularity shifts[15]and human dynamics[5].All these studies have shown that the high cumulative degree of nodes is not a guarantee for large degree increase in the future.

In this paper,we focus on the trend prediction in complex networks,i.e.to identify the most popular nodes in the future.In the literature,there are some existing methods for this problem especially for a certain application field[18–21].For instance,in the popular online service https://www.wendangku.net/doc/1c5970279.html,,the initial growth popularity has been used to predict its later popularity[19].Popu-larity-based predictor has been designed for trend prediction and its performance is shown to be further enhanced if the user social network is incorporated[20].In this paper,we introduce a temporal-based predictor(TBP)which takes advantage of the time decay effect found in many empirical works.The validation of the method is conducted in three time-stamped data sets.The results show that the prediction precision is remarkably higher than that of PA.In ad-dition,the new method is especially effective in identifying the potential nodes with low popu-larity in the past but high popularity in the future.

Materials and Methods

The system we considered in this paper can be modeled by the bipartite network which consists of a set of users U and a set of objects O.We use Latin letters for users and Greek letters for ob-jects to distinguish them.A bipartite network can be represented by an adjacency matrix A, where elements A iαare equal to1if user i has collected objectαand0otherwise.We consider snapshots of these networks at different time stamps by taking into account only the links es-tablished before a given time t,and we use A(t)to denote the adjacency matrix at time t.The number of objects collected by user i and the number of users who collected objectαat time t (i.e.,user degree and object degree)are computed as k i(t)=∑αA iα(t)and kα(t)=∑i

A iα(t),respectively.

The popularity increase of objectαin future T F time steps(i.e.the future time window)is then

D k aet;T

F T?k aettT

F

Tàk aetT:e1T

For a suitably chosen value of T F,this quantity can measure the temporal interest in objectα. The main goal of trend prediction in this paper is to identify the most popular objects in the fu-ture.To this end,we de?ne a testing time t and a future time window of length T F,and rank all objects according to their popularity increaseΔkα(t,T F).This ranking is considered as the true ranking of popularity in the future.A generic predictor will make use of the information before t and assign prediction scores sαto all objects.These scores will be mapped into a predicted

ranking.In general,the higher overlap of the predicted ranking and the true ranking,the better the predictor is.

Popularity-based Predictor

Preferential attachment is a well-known mechanism of network evolution which assumes that the probability a node to attract a new link is proportional to its cumulative degree.In trend prediction,this means that objects which are popular at time t are expected to have better chances to attract new links from users.This implies that the cumulative degree of an object

kα(t)is a good predictor of its future popularity increase.Considering the decaying interest in objects,the prediction scores can be set as the recent popularity of objects.The prediction score of an object at time t can be calculated byΔkα(t,T P)where T P is the length of the consid-ered history.Recently,a popularity-based predictor(PBP)[20]has been proposed to combine the predictor kα(t)andΔkα(t,T P).PBP has a tunable parameterλ2[0,1]to make the new pre-dictor change smoothly from kα(t)toΔkα(t,T P).Mathematically,the prediction score of PBP is computed as

s aet;T PT?e1àlTk aetTtl D k aet;T PT?k aetTàl k aetàT PT:e2TThis predictor simpli?es to the total popularity method whenλ=0and to the recent popularity method whenλ=1.

Temporal-based predictor

The popularity-based predictor in fact considers all the recent popularity of objects,but weak-ens the influence of links an object received before t?T P.In the literature,it has been found that the interest toward individual objects vanishes exponentially with time in some case[13, 22].Therefore,it is too arbitrary to simply divide the time into two segments in the popularity-based predictor,as it may lose the detailed temporal information of real networks.

To make better use of the temporal information for trend prediction,we proposed a tempo-ral-based predictor(TBP)in this paper.In TBP,we consider the mechanism that the influence of a link exponentially decays with time.An aging function is accordingly introduced to calcu-late the prediction scores:

X

A i aetTe geT i aàtT;e3T

s aetT?

i

where T iαdenotes the time at which user i select objectα.γis a positive parameter which con-trols the decay speed.A largerγindicates a faster decay,andγ=0corresponds to the cumula-tive popularity without any decay.TBP preserves all the detailed temporal information in the network.By adjustingγ,we can study the temporal effects in the prediction of the

future popularity.

Data Description

To test the performance of TBP,we use three distinct real data sets:MovieLens,Netflix and Facebook in this paper.Movielens and Netflix data sets contain movie ratings,and Facebook data set contains users’wall post relationships.MovieLens is provided by GroupLens project at University of Minnesota(https://www.wendangku.net/doc/1c5970279.html,).We use their10million ratings data set.Each user in MovieLens data set has at https://www.wendangku.net/doc/1c5970279.html,flix is a huge data set released by the DVD rental company Netflix for its Netflix Prize(https://www.wendangku.net/doc/1c5970279.html,).The original data has480189users,17770objects and100480507ratings.Since the original Movielens and Net-flix data sets are large,we extracted a small subset from each of them by randomly choosing

some users who have rated at least 20movies and took all movies they had rated.For both

Movielens and Netflix,the ratings are given on the integer scale from 1to 5(from worst to

best).We here only consider the ratings higher than 2as a link.The final data consists of 5000

users,7533movies,and 864581links in Movielens and 4960users,16599movies,and 1249058

links in Netflix.Facebook data set contains a list of all the wall posts from the Facebook New

Orleans networks [23–25].A link from one user to another corresponds that the user post on

another user ’s wall.As the Facebook network is a unipartite directed network,here we mapped

it to a bipartite network with a set of users and a set of users ’walls (objects).If a user has posted

on a wall,there will be a link between the user and the wall.The original data has 42390users,

39986objects and 876993links.Since user may written on his own wall,we remove these links

to eliminate self-influence.The final data consists of 40981users,38143objects and 855542

links.For all of these three data sets,the time is counted by days.The characteristics of these

data sets are summarized in table 1.All these three data sets are available from the Koblenz

Network Collection [26],and they are all free to use even for commercial purposes (the data

sets we used in this paper are free to download as S1Dataset ).

Evaluation Metrics

We apply three metrics to give quantitative measurements of the predictors ’performance:

AUC,precision and novelty.

As the main point of this paper is to predict which objects will be popular in the future,only

the top part of the ranking list should be considered when evaluating the performance of the

predictors.We thus use a standard measure in information filtering literature named AUC

[27],which evaluates a ranking list by calculating the relative position of its top n objects.We

select the top n objects in the real future as a group of benchmark objects,and denoted it as set

B .The other objects are in the complement group of B ,which denoted as B 0.Then the AU

C is

calculated as

AUC ?

1X a 2B X b 2B 0I es a ;s b T;e4T

where

I es a ;s b T?0;if s a s b :e5T8>>><>>>:AUC equals to one when all benchmark objects are ranked higher than the other objects,

while AUC =0.5corresponds to a completely random object ranking

list.

Table 1.Basic statistical features of the data sets.

Data set

Users Objects Links Period

doi:10.1371/journal.pone.0120735.t001

Another evaluation matric is called precision.It is defined as the fraction of objects in the top n places of the estimated ranking that appear also in the top n places of the true ranking [28].The precision of the predictor is defined as P n=D n/n,where D n indicates the number of common objects in the top n places of the predicted ranking and the true ranking.It lies in the range[0,1],the higher the better.

It is often the case that objects popular in the future time window(t,t+T F]were already popular in the past.Successful prediction of those objects can contribute to precision P n.How-ever,prediction of these objects brings much less benefit to users than the prediction of genu-inely“new entries”,i.e.objects that were missing in top n in the past but they appear there in the future time window.We label the true number of those objects as E n and the number of those successfully identified by the predicted ranking as C n,respectively.The rate of correct prediction of these new entries is Q n=C n/E n.This allows us to measure how well a predictor is able to identify the potential objects.Here,we name Q n as novelty.

Results

To obtain the final evaluation of the predictors’performance,we average results over10ran-domly selected t for each data sets.To make sure that there is enough history information,t is set as at least one year later than the first record in each data set.As all the predictors we con-sidered in this paper are based on objects’history,we only consider the objects with at lest one link before the testing date t.

Fig.1shows the performance of the TBP under differentγin Movielens,Netflix and Face-book data sets,respectively.T F is set as30days for all there data sets.Different n values are given for all metrics.The results show that the influence ofγdoesn’t change by n.Whenγ=0 (equivalent to cumulative degree predictor),the AUC and P n are relatively small and Q n=0for all data sets.This indicates that PA have little efficacy in predicting the future popularity.A smallγ(a relatively slow time decay)can significantly increase the prediction performance,es-pecially for Q n.The high value of Q n indicates that the temporal-based predictor has a great power to identify“new objects”which are not yet popular.A too largeγwill decrease the per-formance,and we can get the best performance of TBP for all data sets by changingγ.We de-noteγ?as the parameter resulting in the highest P n value.It is clear that the performance of TBP with parameterγ?is remarkably higher than that of PA(i.e.γ=0in TBP).

Table2shows the performance of TBP and PBP for the three data sets.The parameter for each predictor is selected as the one corresponding to the highest P n value.From the results,we could find that TBP has a better performance for most evaluation metrics.The bestλvalue for PBP is0.98for both Movielens and Netflix data set and0.93for Facebook data set,which indi-cates that the links an object received long time ago has a small influence on its future populari-ty.Unlike the arbitrarily dividing the time into two segments in PBP,TBP uses an exponential decay function which can future improve the prediction performance.

We set a rank change value for each object as dr=r f?r p,where r f is the real rank in the near future and r p is the rank by a predictor.dr=0indicates the prediction rank is echo to the real rank in the near future,and the predictor has perfect performance;dr<0indicates the predic-tor underestimate the object’s popularity.dr>0indicates the predictor overrate the object’s popularity.To test how different predictors influence the rank of the top objects in the future, we plot Fig.2to show the correlation of dr with these objects’degree rank r k in the testing time under different predictors and parameters.This figure only considers the top100objects in the T F window.The parametersλfor PBP are set as the one corresponding to the highest P n value. The parametersγ?andγ=1are also selected for TBP in Fig.2.These objects are ranked in the top100positions in the near future,but their history popularity has a board distribution.Both

TBP and PBP with the best parameter can reduce the absolute value of dr ,which makes these

predictors have better performance in predicting future https://www.wendangku.net/doc/1c5970279.html,pered with PBP,TBP

is better at improving the rank of objects that have high r k but ranked in top place in the future.

When γ=1in TBP,it is clear that the absolute value of rank difference d r of the objects that

have high r k becomes smaller than the case with γ?,but the absolute value of rank difference d r

of objects with lower r k becomes larger.That may be due to the fact that parameter γcan

give

Fig 1.The prediction result of TBP for Movielens,Netflix and Facebook data sets under different γ.The performance of different n values are given.n =50is presented by black lines with squares,n =100is presented by red lines with circles,and n =200is presented by blue lines with triangles.

doi:10.1371/journal.pone.0120735.g001

less rank score to the objects that are popular in the past,and then improve the rank of the ob-

jects that are not popular in the past.The higher the parameter γis,the more obvious this

influence is.

A small T F aims to predict objects ’popularity in the short term while a large T F requires to

predict the trend in long term.Therefore,we test the performance of the predictors under dif-

ferent T F .Fig.3shows the performance of the TBP and PBP as a function of the future time

window T F .The parameters corresponding to the highest P n value for each predictors at each

T F point are selected.For PBP,T P is set as the same length of T F .Compared with PBP,it is

clear that TBP has a better prediction performance for all T F value.For all data sets,the preci-

sion P n ,novelty Q n and AUC of the predictors increase substantially with T F when T F is very

small.This is because there is a lot of noise when T F is too small.However,the precision de-

creases with T F while T F becomes larger.This may because the predicted popularity becomes

outdated for larger T F time.

As we know,γcan control the decay speed of the influence of the old links.For different

prediction time interval T F ,the best parameter γ?may be different.Fig.4shows the relation-

ship of γ?with T F .One could find that,the larger the T F length,the smaller the γ?value.That

means for a shorter T F prediction,the objects ’recent popularity matters more.While for a larg-

er T F interval prediction,longer historical popularity should be

considered.

Fig 2.The correlation of dr with r k for top 100objects in the real future.Black lines with circles present the result of TBP with the best parameter γ*,red lines with triangles present the result of PBP with the best parameter λ*,and blue lines with diamonds present the result of TBP with γ=1.T F is set as 30days for all data sets.

doi:10.1371/journal.pone.0120735.g002

Table 2.The prediction performance of temporal-based predictor (TBP)and popularity-based predictor (PBP)for Movielens,Netflix and Facebook data sets.The parameter for each predictor is set as the one corresponding to the highest P n value.n is set as 100.

Data set

Predictor Parameter AUC P Q

doi:10.1371/journal.pone.0120735.t002

Discussion

To summarize,we proposed a temporal-based predictor (TBP)in this paper,and studied the performance of TBP in trend prediction.The basic idea of TBP is to introduce an exponentially time decay to predict objects ’future popularity.We make use of three metrics to evaluate the predictor ’s performance:P n ,Q n and AUC.We found that the parameter γ,which controls the speed of the time decay,can give less rank score to the objects that are popular in the past,

and

Fig 3.The performanc of TBP and PBP as a function of the future time window T F .TBP is presented by black lines,and PBP is presented by red line.n is set as 100.Time is measured in days.

doi:10.1371/journal.pone.0120735.g003

accordingly improve the rank of the objects that are not popular in the past.The higher γis,the more obvious of this influence is.Thus,TBP has a higher ability to detect “new entries ”that have a lower cumulative popularity but a higher future popularity,and promote these ob-jects to the front of the predicted ranking https://www.wendangku.net/doc/1c5970279.html,pared with PBP,TBP has a higher ability to detect the objects that will be popular in the future with different future length T F .

Ranking is one of the most important and fundamental method to solve information over load problem.The study of the popularity dynamic of online information gives us some inspi-ration to solve the trend prediction problem.This is a very practical issue.In this paper,we studied the links ’temporal effects on objects ’future popularity.This study is based on the ex-perimental finding that the ability of a node to attract new links vanishes exponentially with time.Besides time,there are a lot of other elements that can influence the dynamic of populari-ty,such as human dynamics,links heterogeneity,and external influence.Introducing the influ-ence of these elements may future improve the performance of trend predictor.In our future studies,we will focus on improving the performance of the trend predictors with the help of both empirical observations and theoretical analysis.

Supporting Information

S1Dataset.The data sets we used in this paper.

(RAR)

Author Contributions

Conceived and designed the experiments:YZ AZ.Performed the experiments:YZ.Analyzed the data:YZ AZ WW.Contributed reagents/materials/analysis tools:YZ WW.Wrote the paper:YZ AZ.

References

1.

Simon HA (1971)Designing organizations for an information-rich https://www.wendangku.net/doc/1c5970279.html,puters,communications,and the public interest 72:37.2.Szabo G,Huberman BA (2010)Predicting the popularity of online https://www.wendangku.net/doc/1c5970279.html,munications of the

ACM 53:80.doi:

10.1145/1787234.1787254

Fig 4.The relationship of γ*with T F for Movielens,Netflix and Facebook data sets.

doi:10.1371/journal.pone.0120735.g004

3.Asur S,Huberman BA,Szabo G,Wang C(2011)Trends in social media:persistence and decay.In:

ICWSM.

4.Kumar R,Novak J,Raghavan P,Tomkins A(2005)On the bursty evolution of blogspace.World Wide

Web8:159.doi:10.1007/s11280-004-4872-4

5.Crane R,Sornette D(2008)Robust dynamic classes revealed by measuring the response function of a

social system.Proceedings of the National Academy of Sciences105:15649.doi:10.1073/pnas.

0803685105

6.Figueiredo F,Fabrício B,Jussara AM(2011)The tube over time:characterizing popularity growth of

youtube videos.In:In Proceedings of the fourth ACM international conference on Web search and data mining.ACM,pp.745–754.

7.Aperjis C,Huberman BA,Wu F(2010)Harvesting collective intelligence:Temporal behavior in yahoo

answers.Ar Xiv:1001.2320.

8.Gruhl D,Guha R,Kumar R,Novak J,Tomkins A(2005)The predictive power of online chatter.In:Pro-

ceedings of the eleventh ACM SIGKDD international conference on Knowledge discovery in data min-ing.ACM,pp.78–87.

9.Lerman K,Tad H(2010)Using a model of social dynamics to predict popularity of news.In:In Proceed-

ings of the19th international conference on World wide web.ACM,pp.621–630.

10.Barabási AL,Albert R(1999)Emergence of scaling in random networks.Science286:509.doi:10.

1126/science.286.5439.509PMID:10521342

11.Newman MEJ(2009)The first-mover advantage in scientific publication.Europhysics Letters86:

68001.doi:10.1209/0295-5075/86/68001

12.Newman MEJ(2014)Prediction of highly cited papers.Europhysics Letters105:28002.doi:10.1209/

0295-5075/105/28002

13.Medo M,Cimini C,Gualdi S(2011)Temporal effects in the growth of networks.Physical review letters

101:238701.doi:10.1103/PhysRevLett.107.238701

14.Dezs?Z,Almaas E,Lukács A,Rácz B,Szakadát I,Barabási AL(2006)Dynamics of information access

on the web.Physical Review E73:066132.doi:10.1103/PhysRevE.73.066132

15.Ratkiewicz J,Fortunato S,Flammini A,Menczer F,Vespignani A(2010)Characterizing and modeling

the dynamics of online popularity.Physical review letters105:158701.doi:10.1103/PhysRevLett.105.

158701PMID:21230945

16.Yan Q,Wu L(2012)Impact of bursty human activity patterns on the popularity of online content.Dis-

crete Dynamics in Nature and Society.

17.Yang J,Leskovec J(2011)Patterns of temporal variation in online media.In:Proceedings of the fourth

ACM international conference on Web search and data mining.ACM,pp.177–186.

18.Roja B,Asur S,Huberman BA(2012)The pulse of news in social media:Forecasting popularity.In

ICWSM.

19.Jamali S,Rangwala H(2009)Digging digg:Comment mining,popularity prediction,and social network

analysis.In:In Web Information Systems and Mining.International Conference on.IEEE,pp.32–38. 20.Zeng A,Gualdi S,Medo M,Zhang YC(2013)Trend prediction in temporal bipartite networks:the case

of movielens,netflix,and digg.Advances in Complex Systems16:04n05.doi:10.1142/

S0219525913500240

21.Wang D,Song C,Barabási AL(2013)Quantifying long-term scientific impact.Science342:127–132.

doi:10.1126/science.1237825PMID:24092745

22.Zhu H,Wang X,Zhu JY(2003)Effect of aging on network structure.Physical Review E68:056121.

doi:10.1103/PhysRevE.68.056121

23.Facebook wall posts network dataset—KONECT.Available:http://konect.uni-koblenz.de/networks/

facebook-wosn-wall.Accessed20September2014.

24.Viswanath B,Mislove A,Cha M,Gummadi KP(2009)On the Evolution of User Interaction in Facebook.

In:Proc.Workshop on Online Social Networks.pp.37–42.

25.Kunegis J(2013)KONECT-The Koblenz Network Collection.In:Proc.Int.Web Observatory

Workshop.pp.1343–1350.

26.The koblenz network collection.Available:http://konect.uni-koblenz.de/.Accessed30November2014.

27.Hanley JA,McNeil BJ(1982)The meaning and use of the area under a receiver operating characteristic

(roc)curve.Radiology143:29.doi:10.1148/radiology.143.1.7063747PMID:7063747

28.Herlocker JL,Konstan JA,Terveen LG,Riedl JT(2004)Evaluating collaborative filtering recommender

systems.ACM Transactions on Information Systems(TOIS)22:5–53.doi:10.1145/963770.963772

相关文档