文档库

最新最全的文档下载
当前位置:文档库 > B.Schiele and A.Waibel. Gaze based tracking based on face-color

B.Schiele and A.Waibel. Gaze based tracking based on face-color

A System for Visualizing and Analyzing the Evolution of

the Web with a Time Series of Graphs

Masashi T oyoda toyoda@tkl.iis.u-tokyo.ac.jp

Masaru Kitsuregawa kitsure@tkl.iis.u-tokyo.ac.jp

Institute of Industrial Science,University of T okyo 4-6-1Komaba Meguro-ku,T okyo,JAP AN

ABSTRACT

We propose WebRelievo,a system for visualizing and ana-lyzing the evolution of the web structure based on a large Web archive with a series of snapshots.It visualizes the evo-lution with a time series of graphs,in which nodes are web pages,and edges are relationships between pages.Graphs can be clustered to show the overview of changes in graphs. WebRelievo aligns these graphs according to their time,and automatically determines their layout keeping positions of nodes synchronized over time,so that the user can keep track pages and clusters.This visualization enables us to understand when pages appeared,how their relationships have evolved,and how clusters are merged and split over time.Current implementation of WebRelievo is based on six Japanese web archives crawled from1999to2003.The user can interactively browse those graphs by changing the focused page and by changing layouts of http://www.wendangku.net/doc/38cc1607cc17552707220830.htmling We-bRelievo we can answer historical questions,and to inves-tigate changes in trends on the Web.We show the feasi-bility of WebRelievo by applying it to tracking trends in P2P systems and search engines for mobile phones,and to investigating link spamming.

Categories and Subject Descriptors

H.5.4[Information Interfaces and Presentation]:Hy-pertext/Hypermedia—Navigation;H.3[Information Stor-age and Retrieval]:Information Search and Retrieval General Terms

Design,Experimentation

Keywords

Visualization,Web graph,evolution,link analysis,link spam-ming

Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro?t or commercial advantage and that copies bear this notice and the full citation on the?rst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speci?c permission and/or a fee.

HT’05,September6–9,2005,Salzburg,Austria.

Copyright2005ACM1-59593-168-6/05/0009...$5.00.1.INTRODUCTION

The Web has been dramatically growing and changing its hyperlink structure by re?ecting real and virtual activities. When major events occur,various web pages about these events are created,and then pages with important informa-tion become pointed to by many pages.Such events could be war and terrorism in the real world,and could be ap-pearance of a new type of software such as P2P?le sharing systems in the virtual world.

Moreover,the structure of the Web is now intentionally changed to control ranking of electronic commerce sites in search engines.The main reason is the fact that higher ranked sites have ability to pull in more customers.Such commerce sites mainly target link based ranking method such as PageRank[2]that gives high scores to pages pointed to by many other pages with high scores.One way to manip-ulate such scores is concentrating links to their sites.Then those sites seem to be popular,and can collect high scores. Such manipulation is called link spamming.

Since hyperlinks represent attention of page authors to the destination pages for better or worse,we could see vari-ous changes in trends on the Web from the evolution of the hyperlink structure.Tracking structural changes in the Web is important in the following situations:

?Answering historical questions about web pages,such as when pages appeared and disappeared,and how their relationships changed over time.

?Investigating link spamming structure to eliminate its e?ect and to correct ranking in search engines.

?Observing and tracking social and cultural trends over time for sociological research.

We propose a system WebRelievo(WEB RELatIonship EVOlution)for visualizing and analyzing the evolution of the web structure based on a large web archive including snapshots of web pages periodically collected by crawlers. Currently,we use six web archives of Japanese web snap-shots crawled from1999to2003.

WebRelievo visualizes a time series of web graph,in which nodes are web pages and edges are relationships between pages,such as hyperlinks and results of link analysis.To show the structure of the web from various aspects,WebRe-lievo provides several variations of the web graph.The most basic graph consists of web pages and hyperlinks.However, this graph is too complicated to understand and to visu-alize its structure.Since famous web pages are linked to