文档库 最新最全的文档下载
当前位置:文档库 › The impact of address allocation and routing on the structure and implementation of routing

The impact of address allocation and routing on the structure and implementation of routing

The impact of address allocation and routing on the structure and implementation of routing
The impact of address allocation and routing on the structure and implementation of routing

The Impact of Address Allocation and Routing on the Structure and Implementation of Routing Tables

Harsha Narayan University of California,San

Diego hnarayan@https://www.wendangku.net/doc/a512723185.html, Ramesh Govindan

University of Southern

California

ramesh@https://www.wendangku.net/doc/a512723185.html,

George Varghese

University of California,San

Diego

varghese@https://www.wendangku.net/doc/a512723185.html,

ABSTRACT

The recent growth in the size of the routing table has led to an interest in quantitatively understanding both the causes(e.g.,multi-homing)as well as the effects(e.g.,impact on router lookup imple-mentations)of such routing table growth.In this paper,we describe a new model called ARAM that de?nes the structure of routing ta-bles of any given size.Unlike simpler empirical models that work backwards from effects(e.g.,current pre?x length distributions), ARAM approximately models the causes of table growth(alloca-tion by registries,assignment by ISPs,multihoming and load bal-ancing).We show that ARAM models with high?delity three ab-stract measures(pre?x distribution,pre?x depth,and number of nodes in the tree)of the shape of the pre?x tree—as validated against20snapshots of backbone routing tables from1997to the present.We then use ARAM for evaluating the scalability of IP lookup schemes,and studying the effects of multihoming and load balancing on their scaling behavior.Our results indicate that algo-rithmic solutions based on multibit tries will provide more pre?xes per chip than TCAMs(as table sizes scale toward a million)unless TCAMs can be engineered to use8transistors per cell.By con-trast,many of today’s SRAM-based TCAMs use14-16transistors per cell.

Categories and Subject Descriptors

C.2.3[Computer-Communication Networks]:Network Opera-tions—routing

General Terms

Measurement,Performance

Keywords

Routing tables,modeling,IP lookups

1.INTRODUCTION

In recent years,Internet measurement and modeling studies have focused on Internet topologies[5],paths[6],and routing behav-Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for pro?t or commercial advantage and that copies bear this notice and the full citation on the?rst page.To copy otherwise,to republish,to post on servers or to redistribute to lists,requires prior speci?c permission and/or a fee.

SIGCOMM’03,August25–29,2003,Karlsruhe,Germany.

Copyright2003ACM1-58113-735-4/03/0008...$5.00.ior[7,8].Only recently has there been an exploration of the struc-ture and growth of the global routing table.This exploration has been sparked by dramatic growth in the table size(from30,000to 120,000)in the last few years.Growth in table size has become a popular topic for mailing lists,and in the trade press[9]because of its alarming implications for router vendors.However,the growth in the table also leads to two natural research questions.Q1.How is this growth caused and how will changes in the relative prevalence of various causes affect table size?Q2.How does growth impact router implementations?

It is fairly well known that some of the causes of growth are the failure to aggregate properly,load balancing and multihoming.In a recent measurement study,Bu et al.[10]provide valuable initial insight into some of these causes by showing,for example,that multihoming contributes20-30%of the pre?xes in current rout-ing tables and that this factor is on the rise.However,they stop short of exploring the sensitivity of table growth to changes in these causes.Furthermore,while they provide some link between causes and the total number of pre?xes,they do not explore a link between causes(e.g.,multihoming)and the structure of the routing table. The structure of the routing table can be thought of as the shape of the tree(e.g.,trie)induced by the set of pre?xes in the table.

Why might the structure of the table1be of interest—as opposed to merely the size of the table?Table structure is crucial because it helps to answer question Q2about the impact on router imple-mentations.In particular,some router vendors do IP lookups based on compressed multibit tries[11,12].The amount of high speed memory required by such schemes(which is a?rst order measure of the implementation cost)is directly dependent on the structure of the routing table.Intuitively,if pre?xes tend to bunch together(as opposed to being randomly scattered),multibit tries will result in very compact and affordable implementations even for large rout-ing tables.

The issue is becoming particularly pressing because Content Ad-dressable Memory(CAM)manufacturers are targeting offerings for large core router forwarding tables.Traditional problems with ternary CAMs(TCAMs)such as the scaling of the match logic and power consumption are being surmounted using innovative tech-niques.However,TCAMs still are much less dense than memories (14-16transistor cells for SRAM-based TCAMs as opposed to6 transistors for an SRAM cell).Thus it appears that for the same number of transistors,an algorithmic implementation(based,say, on compressed multibit tries)can handle a larger number of pre-?xes than a TCAM.

While this is the hypothesis,and some are working to provide 1In this paper,we use the term routing table to mean the collection of pre?xes associated with routing entries,and ignore other route attributes(AS paths,next hops,BGP communities etc.).

algorithmic solutions for IP lookups as an alternative to TCAM so-lutions,there is little hard evidence to make a case one way or the other.The problem is that if the present exponential growth rate were to continue,one would expect core router tables to reach a million pre?xes in another6years or so.Routers shipping today thus may have to support up to1million pre?xes to be usable for5 years,often the minimum period considered for such capital equip-ment.

Thus to reasonably compare TCAMs versus algorithmic lookup solutions one needs some way of generating realistic table sizes that are several times the size of today’s routing tables.More im-portantly,to accurately model the storage of algorithmic(e.g.,trie-based)solutions,one has to ensure that the generated tables accu-rately re?ect the structure of current(and hopefully future)routing tables.The upshot of this argument is that router vendors today need a good model of the structure of current tables that can be projected to the future.

Note that an accurate model can also help a chip vendor selling an algorithmic solution predict the amount of memory required to support a given number of pre?xes.A model that re?ects underly-ing causes is also useful in its own right to understand table growth, which we claim should be as much a phenomenon of independent interest as is Internet topology or BGP convergence.

1.1Routing Table Models

In this paper,we introduce a model of routing table structure called ARAM.ARAM contains recognizable elements of the pro-cesses which govern routing table structure:the allocation of ad-dress space from registries to ISPs,and the advertisement of ad-dress pre?xes into the routing system determined by routing prac-tices such as multi-homing and load balancing.We use ARAM to study the storage requirement of IP lookup schemes as a function of table size.Because ARAM has parameters that can control the rela-tive weight of allocation and routing processes,we can explore the impact of variations in the degree of multihoming or load balanc-ing,for example,on the storage requirements of lookup schemes. We argue that an ideal routing table model should have the fol-lowing properties:

?Causal:Given there are well known causes of routing ta-ble growth such as multihoming and load balancing[10],a model should ideally re?ect these intuitive driving forces in order to provide increased understanding.An alternative is to employ an empirical model which ignores causes,and di-rectly encodes chosen measures of current tables such as pre-?x length distribution.

?Parameterizable:Merely re?ecting the current structure of the routing table is dangerous because vast increases in some causes(such as multihoming)can lead to very different rout-ing tables.Thus,an ideal model should have parameters (tuning knobs)that can control the effects of various param-eters to enable a systematic sensitivity analysis.

?Parsimonious:To effectively use and reason with a model, the model should have as few parameters as possible.In the limit one could characterize the shape of the current rout-ing table by providing the pre?x length distribution for all possible initial8-bit values of pre?x bits.While this might characterize the shape better,it uses256*32parameters.

?Accurate:The model should match the“shape”of existing routing tables.Ideally,shape comparisons should use ab-stract measures that can be used to compare any two pre?x tries,regardless of the particular lookup schemes.

?Predictive:The model should be able to accurately predict

the memory used by various lookup implementations as mea-sured on existing tables.Unlike the previous goal where val-idation is done using abstract shape measures,here the goal is to validate a model using more concrete measures such as the memory used by a representative IP lookup scheme. The design of ARAM attempts to satisfy these sometimes con-?icting goals.It derives representative routing tables in three steps that model the processes by which pre?xes enter the table:?rst registries allocate address blocks to ISPs,ISPs advertise their al-locations into the routing table and assign address space to cus-tomers,and customer in turn can advertise more-speci?c pre?xes (i.e.,“punch holes”)to effect specialized routing(e.g.,backup or load balancing).The model uses?ve parameters:one for the num-ber of allocations in the?rst step,and two each for the two remain-ing steps that govern the frequency and extent of ISP and customer routing practice.While the model can be made more accurate by using more parameters,we decided to err in the direction of parsi-mony.

By three abstract measures of tree shape(pre?x length distribu-tion,pre?x depth,and number of tree nodes),ARAM’s routing ta-bles compare well(Section2.3.2)with twenty routing tables span-ning the last?ve years.For example,for the pre?x length distri-bution,ARAM exhibits less than6%error for each of those twenty tables.To match a speci?c routing table,we used the same number of allocations as an input to ARAM as had been handed out by the registries at the time the routing table was taken.For the other four parameters,a single set of values was suf?cient to produce a match. Finally,for each of the twenty routing tables we chose,ARAM’s matching routing tables also closely matched the number of tran-sistors required for a compressed multibit trie implementation(Sec-tion3).Our scheme of choice is the Tree Bitmap algorithm,due to Eatherton and Dittia[12].It is less dated than the seminal Lulea scheme[11]and has fast updates(unlike the Lulea scheme).How-ever,we believe the results would not change signi?cantly for other implementations because the underlying abstract property that de-termines trie storage is the shape(i.e.,the relationships between pre?xes)in the routing table.

We make no claim that ARAM captures all the aspects of rout-ing and allocation practice,that it cannot be further tuned,or that there are no other effective causal models.However,our valida-tion of ARAM(Section2.3.2)gives us some con?dence in our use of ARAM to generate larger table sizes and explore variations in routing and allocation practice.In particular,we use ARAM to shed some light on the somewhat acrimonious debate between TCAMs and algorithmic IP lookup schemes.Leaving aside power and match scaling,our results indicate that algorithmic schemes can provide more density(pre?xes per unit area)than TCAMs.Of course,our results are subject to assumptions(e.g.,that the cur-rent growth trends continue),and thus should only be considered an initial(albeit quantitative)contribution to an ongoing debate. Section2is devoted to the ARAM model and its validation,and Section3is devoted to using ARAM to predict IP lookup perfor-mance.Finally,Section4compares ARAM to previous work,and Section5states our conclusions.

2.ARAM:THE MODEL AND ITS V ALIDA-

TION

In this section,we discuss current allocation and routing practice, and then describe how ARAM generates routing table pre?xes.We then validate ARAM’s routing tables.As well,we validate each individual aspect of the model.

2.1Introduction

In order to allow meaningful extrapolation of routing tables,ARAM attempts to explicitly model the factors that shape routing tables and the relationship between pre?xes.Broadly speaking,there are two mechanisms that shape current routing tables:the allocation of pre?xes by registries,and the advertisements of those pre?xes(or more speci?cs thereof)in BGP tables by ISPs and their customers. Both these mechanisms are only informally codi?ed,if at all.For this reason,we use the terms allocation practice and routing prac-tice respectively to refer to them.

2.1.1Address Allocation Practice

In this section,we describe the essential details of the hierar-chical allocation of IPv4addresses and address pre?xes.ARAM attempts to capture some,but not all of these details.There are cer-tain local variations in allocation practice that ARAM ignores[4]. The IPv4address allocation hierarchy has four levels.The Inter-net Assigned Numbers Authority(IANA,currently administered by the Internet Corporation for Assigned Names and Numbers(ICANN)) delegates blocks of addresses to Regional Internet Registries(RIRs). These,in turn,allocate portions of their address blocks to Local In-ternet Registries(LIRs).With few exceptions,LIRs correspond to ISPs.In turn LIRs assign parts of their address space to“end-users”(organizations or smaller ISPs).

This hierarchy for address allocation has two logical functions. Decentralization of allocation from IANA to the RIRs and then to the LIRs promotes manageability of address allocation.Moreover, the fact that LIRs mostly correspond to ISPs promotes(at least in theory)the scaling of the routing system by enabling aggregatable address assignment.

We now describe the various levels of this hierarchy in some detail.

IANA is the guardian of the entire IPv4address space.It dele-gates parts of the space to the RIRs in units of/8on-demand.IANA holds113/8s[13],of which35have currently been delegated to various RIRs(the rest of the address space is made up of historical allocations and the class B space).In order to qualify for a new/8, an RIR has to have used up80%of its existing/8or demonstrate that it cannot meet an allocation request with its current/82.

There are four RIRs(ARIN in North America,RIPE in Europe, APNIC in Asia-Paci?c and LACNIC,the newest RIR responsible for South America).Each RIR is responsible for address manage-ment in its designated geographic region.RIRs codify their address management practices in policy documents[14,15,16].Although there are minor regional variations in policy[4]the practices of the various RIRs are qualitatively similar.

An RIR usually allocates address pre?xes(address blocks aligned on a bit boundary)to an LIR.Address pre?xes handed out to the LIR are called allocations.Though LIRs are usually ISPs,some-times an RIR will allocate addresses directly to large end-users. Such allocations from an RIR to an end-user go by various names:”Direct Assignment”,”Direct Allocation”,”Provider Independent Allocation”or”Provider Independent Assignment”.

The smallest size of an initial allocation made by an RIR is/20. Further allocations are made when the LIR has assigned80%of its previous allocations to customers.The size of subsequent alloca-tions is determined by the usage rate of past allocations and the ex-pected growth rate(ISPs have to make a business case to the RIRs to demonstrate their expected growth rate).RIRs try to ensure,but do not guarantee,that an allocation to an LIR is aggregatable with 2To reduce administrative load,this policy is currently being changed so that an RIR may get enough/8s to last for18months at a burn rate that is determined by its recent rate of allocations.prior allocations.

The lowest rung of the address allocation hierarchy is that be-tween ISPs and their customers.Address blocks handed out by ISPs to their customers are called assignments.The size of these address blocks depends upon customer demand.It is dif?cult to gauge how ISPs manage their allocated space,but we believe that most ISPs use a?rst-?t or best-?t algorithm to make assignments[17,

18],regardless of whether the space assigned to an individual cus-tomer is aggregatable.Our belief is based on the fact that two freely available software packages for making assignments(FreeIPdb[19] and NorthStar[20])both use a best-?t algorithm to choose address blocks to make assignments.

2.1.2Elements of Routing Practice

Allocation practice determines how,and to whom,address blocks are assigned.Routing practice determines which of these address

blocks appears in the routing table,and in what form(either in its entirety,as sub-blocks,or as more speci?cs of a block).ARAM at-tempts to capture the dominant routing practices that we have been able to infer from the data.It does not incorporate several arcane routing practices[4].

The ideal,espoused by CIDR,is that each allocation to an ISP

has exactly one entry in the routing table,and no assignments ap-pear in the routing table.That is,each customer advertises its as-signment as a route in BGP(or perhaps using static routing or shar-ing an IGP with the ISP,although these practices are probably less prevalent now),but the ISP aggregates these routes,and advertises

to its peers or its upstream provider the address pre?x representing its allocated space.Reality is far from this.

Deviations from this ideal are caused by a variety of routing practices.Most of these routing practices result from multihom-ing.We use this term in a very general sense to include customers connected to multiple upstream providers and ISPs buying transit

from multiple providers.In the following paragraphs,we describe these routing practices and their effect on the routing table.

Some ISPs split their allocation,and advertise the split pre?xes separately in BGP.For example,the allocation made to UUNET, 63.64.0.0/10is sometimes advertised as four/12s rather than directly as a/10.There are two generic examples of ISPs that may

adopt this practice,both of which are attempts to engineer traf?c ?ows.A small ISP may split its allocation and advertise differ-ent split pre?xes to different upstream providers,thereby effecting “load sharing”:i.e.,spreading in-bound traf?c across different up-stream providers.A large national ISP may have an internal level of hierarchy in address allocation.That is,it may split its allocation into address pre?xes,one each for the different geographic regions where it has infrastructure—customers from a particular region get address space from the address pre?x assigned to that region.The ISP then advertises these split pre?xes without aggregating them; this allows it to have more?ne-grain control of routing.(There are some other examples where an allocation appears to have been split in the routing table[4]).

A similar practice is followed by some large end-users like web-hosting services,which have obtained a provider independent allo-cation.Such an end-user might split its allocation and announce the different sub-blocks from different upstream providers,for load-sharing reasons.

The previous routing practices described scenarios in which the original allocation does not appear in the routing table,but sub-pre?xes of that allocation do(this is not always true,since some-times an ISP will advertise its allocation intact and also advertise sub-pre?xes;we return to this in a later section).These sub-pre?xes may not completely cover the original allocation’s address space.

We label this practice splitting,and refer to the allocation that splits as a split allocation,and to the products of splitting as split pre?xes. As we discuss later,not all allocations split.Some allocations ap-pear in their entirety as one pre?x in the routing table.We call these intact allocations,and the corresponding pre?xes intact pre?xes. We now describe routing practices that describe how assign-ments to customers(or sub-pre?xes thereof)appear in the routing table.There are(at least)three qualitatively different routing prac-tices in this category.

A customer of an ISP might provide for backup connectivity by advertising its assigned space through an upstream provider(say ISP B)different from the one it obtained the assignment from(say ISP A).In this case,A’s allocation appears(either intact or split) in the routing table and is a less-speci?c pre?x of the customer’s assignment.A slight variant of this practice is that the customer, instead of advertising its entire assigned space advertises a part of it through the upstream,preferentially drawing traf?c from its upstreams.Finally,a customer which gets its addresses from one ISP might split its assignment into multiple pre?xes to perform load sharing among different providers.

From the perspective of the routing table,these routing practices result in more-speci?c pre?xes being advertised in BGP.We say that a pre?x spawns more-speci?cs,and that these routing practices govern spawning.

2.2ARAM

A detailed causal model of a routing table might have attempted to explicitly include all the processes described above:the alloca-tion of address space to LIRs,the assignment of address pre?xes to customers,and the various routing practices described above.Such a model,while closer to physical reality,is very dif?cult to de?ne, and even harder to validate.As we shall see,the upper-levels of the address assignment hierarchy(IANA to RIRs and RIRs to LIRs)are relatively easy to model.Data about assignments is available,and it may be possible to construct models for this.However,succinctly capturing the physical processes underlying routing practice seems to be beyond our reach today.While the kinds of routing practices that occur today are probably small in number,economic and other considerations determine which ISPs and which customers follow what kinds of routing practices.Capturing these considerations is left for a future generation of models.

ARAM is a less ambitious,but nonetheless very useful,hybrid model,with three parts.First,it models the allocation of address pre?xes from RIRs to LIRs.Then,it incorporates knobs that deter-mine how these allocations appear in the routing table—i.e.,whether the allocation is split,or appears intact.Finally,it models the ap-pearance of customer assignments in the routing table by providing knobs that determine pre?x spawning(Section2.1.2).

In the rest of this section,we describe these three components of ARAM,and provide intuitive justi?cation for our design of the model.In Section2.3,we validate ARAM’s design.

Before describing the internals of ARAM,we describe its inputs and outputs.The output of ARAM is a routing table of a desired size.More precisely,the output is the collection of address pre?xes that might appear in a routing table of this size,and does not include other routing attributes that might be associated with these pre?xes, such as AS paths,next hops,or BGP communities.The desired size of the routing table is not an explicit input to ARAM;rather it is implicitly de?ned by?ve input parameters:

?The number of allocations made by RIRs to LIRs is deter-

mined by a parameter N.

?F split is the probability of an allocation splitting and C split

is the percentage of address space of a split allocation that is

advertised in the routing table

?Finally,F spawn is the probability that a split or intact pre?x spawns one or more pre?xes,and C spawn is the fraction of address space of a split or intact pre?x that is advertised in the routing table as spawned pre?xes(i.e.,as more speci?c pre?xes).

Apart from N,the other parameters encode the aggregate impact of various routing practices described in Section2.1.2.Exporting these knobs allows users of the model to extrapolate how changes in routing practice might affect the“shape”of routing tables(i.e., the relationship between pre?xes).

Modeling Allocations

The?rst step in generating a routing table in ARAM is to generate N allocations.Following current registry practice,ARAM gener-ates allocations whose pre?x length is between/10and/20.The number of pre?xes N(x)of length x(more precisely,x is the pre-?x length minus9,so that the shortest pre?x generated is a/10)is determined by a function of the form N(x)∝x k for some constant k.

The intuition for this simple form of allocation distribution arises from the observation that RIR allocation policy is based on docu-mented need.To get address space of a certain size,an LIR must justify the need for that size either by presenting a business case or by demonstrating an appropriate rate of assignments to cus-tomers[15].Thus,a large ISP is(eventually)allocated a large block of addresses(shorter pre?x length),and small ISPs are al-located smaller blocks(longer pre?x length).Since the distribution of ISPs can plausibly be said to be a power law(following the more general,and well-established empirical observation that companies are power-law distributed by size[21]),we can expect that N(x) follows the form described above.

This model makes one important simpli?cation.In practice,an LIR is progressively allocated space based on need and,because these allocations are not guaranteed to be contiguous,an LIR may have several distinct pre?xes allocated to it.ARAM,however,im-plicitly allocates one pre?x to an LIR,and assumes that this pre?x is the aggregated result of several allocation requests.Furthermore, this version of ARAM hard-codes the value of k=3.4into the model.In principle,of course,we could have exported k to enable the user of a model to explore how quantitative changes in the allo-cation distribution affect routing tables.We have left this for future work.

The mechanics of allocation in ARAM work as follows.ARAM maintains a pool of/8s from which it makes these allocations.As a matter of detail,the/8s that ARAM uses are the same ones that are in use by RIRs,or that have been reserved by IANA for future use.There are about113such/8s.ARAM repeatedly performs the following steps N times(assuming that the/8s are numerically ordered):

?It draws a random sample from the distribution N(x)∝x k.

This determines the pre?x length of an allocation.

?It allocates this pre?x from the?rst/8whose current utiliza-tion is less than80%.

In this way ARAM sequentially?lls up/8s until N allocations have been made.

Modeling Advertisement of Allocations

The next step in the model determines how allocations appear in the routing table.As we discussed in Section2.1.1,although one might expect an allocation to appear intact in the routing table,a variety of prevalent routing practices cause allocations to“split”

and appear as a collection of sub-pre?xes in the routing table. Rather than attempt to model these routing practices,ARAM de-?nes the frequency and extent of this practice using two parameters.

F split de?nes the percentage of allocations that split.For each al-location,ARAM tosses a coin with probability F split to determine whether that allocation is split or intact.When an allocation splits, in practice,the collection of split pre?xes corresponding to that al-location does not usually cover the allocation’s address space.Intu-itively,these split pre?xes cover only that part of the address space actually utilized(assigned to customers).To model these,ARAM uses the C split parameter and,for each split allocation,generates a number of pre?xes such that the address space covered by those pre?xes is as close as possible to C split times the address space of the split allocation.

Into what pre?xes does an allocation split?From our analysis of the data,there does not appear to be a dominant pattern of alloca-tion splitting,nor does there appear to be a rationale(e.g.,larger allocations splitting in a certain way)for the way split allocations appear in the routing table.

There is one important exception to this.Some web-hosting and content providers(these are not the only examples of such practice; the other examples defy classi?cation but are numerous)that obtain their own allocations from RIRs sometimes split into a large num-ber of small pre?xes(e.g.,/24s).Each such pre?x intuitively repre-sents one data-center or part thereof.ARAM captures this practice by assuming that a?xed fraction(20%)of split allocations split in this way.

For the rest of the split allocations,we use a rather ad-hoc split-ting rule.Two observations guided the design of this rule.First, in practice,allocations do not split into equal-sized address blocks. In particular,when splitting is for the purposes of load sharing,one might expect that skewed traf?c distributions to different parts of the address space will result in size diversity among the split pre-?xes.Second,for reasons of routing manageability,we suppose that allocations will split into a relatively small number of pre?xes. The splitting rule we use is:

Split the allocation into pre?xes of length/(i+2)and

/(i+3),where/i is the length of the original allocation.

The number of/(i+2)s and/(i+3)s are determined

as follows:for every/(i+2),two/(i+3)s are also

produced(i.e.,equal amounts of address space appear

as/(i+2)s and as/(i+3)s).Finally,as many/(i+3)s

are added as are necessary to cover up to C split times

the address space of the allocation.

Modeling More Speci?cs

In this?nal step of the model,ARAM determines the pre?xes which are spawned from intact or split pre?xes.Essentially,spawned pre-?xes in a routing table appear as more-speci?cs of other pre?xes, and represent various routing practices:backup routing and cus-tomer load sharing among upstreams(Section2.1.2),or an ISP an-nouncing its allocation as well as pre?xes split from that allocation (Section2.1.1).

As with splitting,rather than modeling these routing practices, ARAM encodes their effect using two parameters:F spawn,and C spawn.F spawn is the probability that an intact or split pre?x ac-tually spawns at least one more-speci?c pre?x.Intuitively,not all such pre?xes spawn more-speci?cs.Consider a cable ISP that splits up its allocation;because such an ISP serves residential customers or small businesses,the ISP does not have customers who engage in multi-homing.Similarly,an end-user who receives a direct allo-cation from the RIR would not need to spawn pre?xes if it does not perform load sharing.

Furthermore,not all the space covered by an intact or split pre?x appears as spawned pre?xes.For example,only some of an ISP’s

customers may actually multi-home,resulting in more speci?cs

in the routing table.Once ARAM has decided,based on a coin

toss with probability F spawn whether a given intact or split pre?x

spawns some pre?xes,it then generates spawned pre?xes such that C spawn of the address space of the original pre?x is covered by spawned pre?xes.

Finally,as with splitting,our rule for generating spawned pre-

?xes represents a delicate compromise between trying to capture

diversity in routing practice,and keeping the model simple and

understandable.We have two spawning rules.First,all spawned

pre?xes are in the range/19-/24.This rule follows from ISP?l-tering practice which limits the longest pre?x that may appear in backbone routing tables to/24.In addition,very few customers get blocks larger than/19from ISPs.Second,to generate spawned pre-?xes for a given“parent”pre?x,ARAM repeats the following two steps until the fraction C spawn of the parent pre?x is covered.

?Pick the largest i between19and24such that one/i,two /(i+1)s,four/(i+2)s and so on up to224?i/24s can be gen-erated within the spawnable address space.(The basic idea is that equal address space is devoted to each pre?x length).

The motivation for this rule is its simplicity.It closely paral-lels our splitting rule;in that case,however,routing manage-ability was used as a motivation to keep the number of split pre?xes relatively small.Such a consideration isn’t neces-sary for spawning since an ISP cannot,in general(of course, there may be exceptions to this)control what its customers do with their assignments.

?Assign each pre?x,without overlap,to a random location within the parent pre?x’s address space.The intuition behind this is that from the perspective of an ISP,which customer de-cides to advertise its assignment for backup or load balancing is generally uncorrelated with the address space,so that the more-speci?cs can be expected to be quite random.

In summary,notice that ARAM models a two-depth routing ta-

ble.By depth of a pre?x,we mean the number of its less speci?c

pre?xes or ancestors.All intact and split pre?xes appear at depth zero of the routing table(i.e.,they have no less speci?c routing ta-ble entries).All spawned pre?xes appear at depth one and have one parent(either an intact or a split pre?x)from which they are spawned.Actual routing tables have a small percentage of pre?xes at other depths(not more than10%during the last5years).

2.3Validation of ARAM

In this section,we validate ARAM by comparing actual rout-

ing table snapshots against comparably sized tables generated by ARAM.We then discuss each aspect of ARAM’s design,and pro-vide quantitative justi?cation wherever possible.

2.3.1Data Sources,Assumptions and Methodology To validate our modeling of allocation practice,we use data from the three main RIRs(ARIN,RIPE and APNIC)3.Databases of allo-cations made by the RIRs are publicly available[24,25,26].LIRs register their assignments in a Whois database;a bulk data dump of these assignments is also available from the RIRs upon written request.

We processed the allocation databases,to sanitize them,in two

ways.First,while most allocations are powers of two,some(about

0.5%)cannot be expressed as a single pre?x.This may happen 3A fourth,LACNIC,was established only in Nov2002and is not included in our validations.

0.010.020.030.040.050.0601020

3040506070

R o o t m e a n s q u a r e e r r o r p e r p r e f i x

Months since Nov. 1997

Scatter ARAM

Figure 1:Normalized RMS error per pre ?x 0

10000

20000

681012

141618202224

N u m b e r o f p r e f i x e s

Prefix length

ARAM

Real routing table

Figure 2:August 1998:Pre ?x length distributions

when an RIR makes allocations which are not a power of two [22]or if two contiguous pre ?xes of different sizes were allocated to the same organization [23].We break up such allocations into the smallest possible number of pre ?xes.Second,occasionally some ISPs will announce aggregated entries of their allocations.In our study,we treat the aggregate as the allocation.This processing re-sults in two invariants;all allocations can be expressed as address pre ?xes,and an allocation cannot be covered by a less speci ?c pre-?x from the routing table.

For validating and understanding routing practice,as well as to calibrate ARAM ’s performance,we used routing tables from Route Views [27].We processed the routing tables in two important ways.First,we removed historic routing table entries (those pre ?xes that belonged to address space that had been allocated prior to the es-tablishment of the registries).ARAM does not attempt to model these historic pre ?xes.Since our goal is to extrapolate current allo-cation and routing practice,we argue that the effect of these histor-ical pre ?xes will soon become negligible.Indeed,in 19974,these historical pre ?xes formed almost 45%of the routing table.As of this writing they account for almost 25%of the number of pre ?xes.Second,we removed pre ?xes longer than /24.Most ISPs ?lter pre-?xes longer than /24,although some such pre ?xes do appear in the Route Views database (possibly because the Route Views’peers do not apply ?ltering rules to their feeds).

2.3.2Validating ARAM

In this section,we describe the results of several kinds of valida-tion tests we performed on ARAM .Recall that the goal of ARAM is to plausibly capture the relationships between pre ?xes.This no-tion is somewhat dif ?cult to precisely de ?ne,and we de ?ne instead three indirect metrics that capture various aspects of the relation-ships.

Our ?rst metric is the normalized RMS error in the pre ?x length distribution .This metric computes the root mean square error be-tween the pre ?x length distribution generated by a model and the pre ?x length distribution in real routing tables.We normalize the RMS error by the number of pre ?xes in each table,resulting in a “normalized RMS error per pre ?x”metric.

Our second metric is the depth ratio ,the ratio of the number of pre ?xes at depth zero to that of the number of pre ?xes at depth one in a routing table.It attempts to capture a different facet of pre ?x relationships than the length distribution—how the less speci ?cs 4

While we would have liked to study the evolution of the routing table soon after the deployment of CIDR,we have been unable to do so due to the lack of periodic routing table snapshots from that era.

and more speci ?cs relate to each other.

Our ?nal metric is a slightly more ?ne-grained measure of pre-?x relationships.The unibit trie density captures the number of nodes in the unibit trie [36]per pre ?x in the routing table.Unibit trie density is inversely related to pre ?x density;when pre ?xes are more closely clustered together in a routing table,fewer unibit trie nodes are required per pre ?x.This metric captures the relationship between parent pre ?xes and spawned pre ?xes.For example,when the difference in the lengths of a parent pre ?x and a spawned pre-?x is large,the number of unibit trie nodes is more than when the difference is small.Similarly,if of three pre ?xes,one is the parent of the other two,the unibit trie is likely to be smaller than a table where these pre ?xes were randomly scattered across the address space.

Using these metrics,we not only compare ARAM to actual rout-ing tables,but we also calibrate ARAM against 5a random scatter model.This scatter model simply scales the pre ?x length distribu-tion of the current routing table by a constant factor to get the pre ?x length distribution of a larger (or smaller)table.It then “scatters”these pre ?xes randomly among the 113/8s that are currently re-served for use by IANA.

Our performance comparisons use 20routing tables dating back to November 1997,approximately 6one every three months.

2.3.2.1Pre ?x Length Distribution.

Figure 1plots the normalized RMS error per pre ?x for ARAM and scatter.Notice that ARAM ’s RMS error is uniformly low (less than 6%over the 5-year measurement period).This is remarkable because ARAM is not explicitly engineered to produce a speci ?c pre ?x length distribution (it only has coarse-grained pre ?x length ranges encoded in it).What is more remarkable,however,is that in producing an ARAM to match a routing table at time T ,we used the number of allocations that had been made by the registries at T .Furthermore,in generating the matching tables,we used one set of values for the split and spawn parameters .(This is testament to the fact that,statistically speaking,the incidence of splitting and 5

Another natural model to compare ARAM to would have been RTG [28].For various logistical reasons this hasn’t been possi-ble.Despite our efforts,the source code for RTG did not run to completion when asked to generate large tables.Furthermore,the RTG paper does not include enough details to reproduce the pre ?x generation part of the generator.We are working with the authors of [28]to resolve some of these issues.In the meantime,we resort to a qualitative comparison between the models (Section 4).6

In one case,incomplete data in the Route Views database pre-vented us from taking samples exactly three months apart.

spawning hasn’t changed much over the5years.)To give the reader a visual sense of the closeness of the match,Figure2compares the pre?x length distributions in actual routing tables to those gener-ated by ARAM.

Scatter has lower RMS error than ARAM,in general.This is because scatter,by design,attempts to match the pre?x length dis-tribution.The reason the RMS error is non-zero in some cases is that we generated all the scatter tables using the pre?x length dis-tribution from one instance of the actual routing table.

2.3.2.2Depth Ratio.

Figure3plots the depth ratio for ARAM,scatter and the routing table.ARAM clearly matches the depth ratio of each instance of the routing table,but this is not surprising since ARAM is implic-itly engineered(given its split and spawn parameters)to generate a number of pre?xes at depth zero and depth one that is comparable to those in a real routing table.

However,it is important to note that scatter gets the depth dis-tribution completely wrong,and it is not hard to see why.Because pre?xes are scattered randomly among the/8s,it is less likely that a pre?x will be at depth one(i.e.,a more speci?c)especially for smaller routing tables.This metric starkly illustrates(as does the next)how ARAM clearly outperforms scatter.

2.3.2.3Unibit Trie Density.

Finally,Figure4plots the unibit trie density for ARAM,scat-ter,and the real routing tables for our twenty measurement points. Scatter grossly overestimates the density,because its pre?xes are essentially uncorrelated with respect to where they are placed in the address space.

However,a real routing table exhibits signi?cant relationships between pre?xes,resulting in smaller unibit tries for the same pre-?x length.ARAM performs well;for current routing tables,ARAM is off by less than7%in this metric.This is encouraging,especially considering the simpli?cations we make in the model(Section2.2).

2.3.3Validating ARAM’s Design Assumptions Having validated the performance of ARAM,we now quantita-tively justify many of our design decisions.

2.3.3.1Allocation Size Distribution.

An important component of ARAM is that the pre?x length dis-tributions of allocations is well modeled by a function of the form y=x k(Section2.2).We used k=3.4.The R-squared value for this value of k is0.92.The?t deviates from the actual data at two pre?x lengths:/16and/19.Pre-CIDR allocations of class B address space account for the deviation at/16.The deviation at/19 is essentially a boundary effect;until recently,the RIRs used/19as the minimum allocation size

An exponential also?ts the data well,although,as we have dis-cussed before,there is an intuitive justi?cation for the algebraic form(that larger ISPs get smaller pre?xes,and ISP sizes can be expected to follow a power-law).

There is one caveat to this distribution.ARAM assumes that the maximum size of an allocation is/10since currently there are no allocations whose pre?x length is smaller than10.ARIN allocates (at a given time)blocks no larger than a/13,and the existing larger allocations that one sees in the databases are the result of contigu-ous allocations made over time.We expect this policy to continue, since a larger maximum allocation size reduces an RIR’s/8signif-icantly(at a given time,each RIR only has a small number of/8s that it allocates from).The other RIRs may change their policies to have a maximum allocation size too.

ARAM,for simplicity,starts allocating pre?xes from a fresh/8 when the current/8’s utilization reaches80%.In practice,/8s can have a utilization ranging from80%to99%.We learnt from AP-NIC[30]and RIPE[31]that they use their/8s up to80%before asking IANA for more.We also learnt from ARIN[29]that they do not make fresh allocations from a/8whose usage has crossed 80%;such a used/8is only used for growing allocations which have already been made from it(recall that RIRs try to make con-tiguous allocations).This is a practice followed in general,by all RIRs.However the usage or non-usage of allocations would not have a major effect on trie properties because only a small percent-age of pre?xes are intact pre?xes.ARAM also leaves one partially used(i.e.,with less than80%utilization)/8per RIR.In practice, there may be more than one such/8per RIR;IANA may some-times choose to give two new/8s to an RIR at once[29].

2.3.3.2Modeling Intact and Split Pre?xes.

ARAM represents the fraction of pre?xes that split using a sin-gle parameter F split.This is a simpli?cation.In practice,we see that the fraction of split allocations decreases with decreasing allo-cation size.We could have used a distribution as an input,but in the absence of a clearer understanding of why splitting varies with pre?x length,doing this would not have helped us understand how routing practice contributes to extrapolations.Obtaining this un-derstanding is left for future work,but a possible explanation for the downward trend for F split is that larger allocations split more frequently because they need?ner grain control over traf?c.

We earlier gave some reasons for allocation splitting:geographic distribution of pre?xes,and load sharing.One relatively minor,but interesting regional variation is caused by a small number of LIRs in the RIPE region(e.g.,the National Institute for R&D in Infor-matics of Romania)which do not provide Internet connectivity to their assignees.Their allocations appear as split allocations in the routing table.

Recall that our splitting rule(Section2.2)sometimes splits an allocation of length/i into pre?xes of length/(i+2)and/(i+3). Actual routing practice is much less cleaner than our rule would suggest(details omitted for brevity),and the incidence and extent of splitting varies with allocation size.

2.3.3.3Modeling Spawned Pre?xes.

In Section2.2,we assumed that spawned pre?xes are in the range /19to/24,largely because ISP assignments fall into that range.The assignment records bear this out.Less than0.4%of all assignments registered in the whois databases(Section2.3.1)are larger than /19.Even the largest ISPs rarely make assignments larger than/19. ARIN and APNIC require that they be consulted before any as-signment larger than a/19is made.ARIN allows extra-large ISPs to consult only when an assignment larger than/18is to be made. RIPE has no such maximum assignment size.We learnt that there are few LIRs who would make assignments larger than/19in the RIPE region[31].

ARAM also assumes that the spawned pre?xes are randomly scattered within a parent’s address space.We empirically observed that assignments made by ISPs which appear in the routing table follow no particular pattern across the ISPs’address space.This is consistent with the intuition that customers may generally multi-home at will,and ISPs generally do not attempt to make multihom-ing assignments from a separate block(there are some exceptions). Our choice of single parameters F spawn and C spawn for the fre-quency and extent of spawning is clearly a?rst order approxima-tion.F spawn actually decreases with increasing pre?x length of the spawning pre?x,but without an understanding of the reasons be-

4

8

121601020

3040506070D e p t h r a t i o

Months since Nov.1997

Real routing table

Scatter ARAM

Figure 3:Depth Ratio 2

34567801020

3040506070

N u m b e r o f u n i b i t t r i e n o d e s p e r p r e f i x

Months since Nov.1997

Real routing table

Scatter ARAM

Figure 4:Unibit Trie Density

hind this variation,we were loathe to include a richer parametriza-tion in the model.

Finally,our spawning rule is biased towards producing a larger number of longer pre ?xes.Thus,ARAM produces a large number of /23s and /24s (see Figure 2for example).The larger number of /24s produced by ARAM is consistent with the data and is caused by a boundary effect;ISPs ?lter their routing tables at /24,and many small organizations that want to multi-home advertise /247.However,the bias shown by the spawning algorithm towards /23s is an artifact of the simplistic choice of spawning rule,and deviates from the data a little.Overall,however,as we have shown above,the routing tables produced by ARAM qualitatively match several years’worth of routing tables.

Finally,ARAM generates pre ?xes only at depth zero and depth one (recall that depth of a pre ?x is the number of less speci ?c pre-?xes in the routing table).Over the last 5years,consistently,fewer that 10%of the routing table pre ?xes have a depth greater than one (detailed data omitted for brevity).

3.

EV ALUATING THE SCALABILITY OF IPV4LOOKUP TECHNIQUES

In the introduction we said that router vendors need a model that is able to generate realistic tables of a million or more entries in order to evaluate the performance of algorithmic solutions for IP lookup.The validation in the last section provides some con ?-dence that ARAM generates realistic tables.In this section we ap-ply ARAM to produce larger tables in order to test the scalability of a speci ?c algorithmic solution when compared to Ternary CAMs.

3.1TCAMs and Multibit Tries

TCAMs (Ternary Content Addressable Memories)can store the values 0,1and X (a “don’t care”value).The ability to store don’t care values makes TCAMs apt for IP pre ?x lookups.Essentially,TCAMs can compare a given destination address to all stored pre-?xes in parallel and return the longest match in one memory access.To store a w bit pre ?x,a TCAM would use w cells (i.e.,each cell stores one bit of information).For example,IP addresses would re-quire 32cells per pre ?x.Variations in routing table structure have no effect on the memory requirements of TCAMs.

By contrast,algorithmic solutions are based on storing pre ?xes in tries .High-speed algorithmic solutions store tries in SRAM.7

This fact is also recognized in RIR policy.ARIN recently rati ?ed a policy,which allows multihoming as a justi ?cation for a customer to get a /24from an LIR (i.e.,the requirement that 25%of an as-signment be used immediately is waived)[32]

Multibit trie algorithms process multiple bits in the trie in a single memory access and are therefore faster.However a multibit trie in-creases the memory required for the lookup structure.To offset this storage increase,modern implementations that store the forwarding table in SRAM generally compress multibit trie nodes.

In this paper,we study the scalability of the Tree Bitmap [12]compression algorithm.While the Tree Bitmap algorithm is basi-cally a re ?nement of the seminal Lulea [11]algorithm,it appears to be more popular because it allows fast updates.CAMs have fast updates [45],and,to keep up with the competition,vendors prefer implementing algorithmic solutions with fast updates.In any case,we believe that similar scaling results will hold for any compressed multibit trie implementation,of which the Tree Bitmap algorithm can be considered a representative.

An example of a multibit trie is shown in Figure 5,where the root node is a multibit node of stride (the number of bits processed in a single memory access)two.The Tree Bitmap algorithm uses bitmaps to encode the multibit trie compactly.The bitmaps are of two kinds internal and external .These two bitmaps are suf ?cient to encode a multibit node.The internal bitmap (2stride ?1bits long)records the pre ?xes present in a multibit node.The external bitmap (2stride bits long)records which child nodes of the multibit node exist.Thus,each multibit node contributes at most one internal bitmap and one external bitmap.

Along with the bitmaps,each multibit node maintains up to three pointers.The child pointer points to the child nodes;since the child nodes are stored contiguously,one pointer suf ?ces to access them.The results pointer points to an array of next-hops.The third pointer is to the internal bitmap -it exists only when the multi-bit node contains pre ?xes.Each node also maintains ?ags that are associated with details of the algorithm and may vary with imple-mentations.

For illustrative purposes,consider 8-bit IP addresses and a three level multibit trie with strides of 2,3and 3(as in Figure 5).The root node of the multibit trie can have four child nodes since its stride is two.The external bitmap of the root node in Figure 5is 1000because only the ?rst child node (of the four possible chil-dren)exists.Multibit trie nodes that neither contain a pre ?x nor have pointers to lower levels of the trie are assumed to have been removed.

The child multibit trie node in Figure 5also has an internal bitmap of 0001100which encodes the pre ?xes it contains.The bitmap is of length 7because there are 7nodes in the subtrie;the ?rst bit corresponds to the root,the next two bits to the two children of the root,and the last four bits to the four leaves.The last four bits are 1100because only the ?rst two leaves have stored pre ?xes.

=0001100

0001*

Figure 5:This ?gure shows an example of a unibit trie on the left and its corresponding multibit version on the right;for the pre ?xes 0000*and 0001*.The bitmaps of the Tree Bitmap Al-gorithm for the multibit version are also shown.The memory required is 45

bits.

Internal bitmap 0001000

Internal bitmap Figure 6:This ?gure shows what happens if the pre ?xes change to 0000*and 1101*.Now,two child nodes are required.The external bitmap is 1001because the the ?rst and the last child nodes of the root exist.The memory required is 73bits.To calculate the memory used in Figure 5,suppose that the mem-ory is bit addressable.Also assume that 5bits per node are required for ?ags.The memory used by the root node is 17bits (8bits for child pointer,4bits for the external bitmap,and 5bits for ?ags).The child node uses 28bits (8bits results pointer,8bit pointer to the internal bitmap,7bits for the internal bitmap,and 5bits for the ?ags)8.

Unlike TCAMs,the memory requirements of multibit tries de-pends upon the relationship between the various pre ?xes in the routing table.To illustrate the importance of trie structure,consider the slightly different routing table containing 0000*,1101*in Fig-ure 6.This small change leads to a fairly big change in the memory requirement because of the need to create a second child node.The ?nal result,using similar calculations,is 17+28+28=73bits.Note that a TCAM would have needed just 2×8=16bits to store these two pre ?xes.

The moral of the examples in Figure 5and Figure 6is that the structure of the trie has a signi ?cant effect on the memory required to store the lookup data structure.

3.2The Transistor Model

Our metric for comparing lookup techniques is the number of transistors required to store a given number of pre ?xes.While there are other important metrics like power (a typical algorith-mic solution can consume six times less power than a TCAM of the same size),TCAM vendors have been working to reduce power using clever banking techniques.Transistor count appears to be a more fundamental differentiator.Lesser transistors mean more chips per wafer or higher yield.Transistor count can also act as a product differentiator in the following sense.For a given chip area and technology,there is a maximum number of transistors that can 8

Some of these overheads may appear large but are actually quite small for 32bit addresses;our example used 8bit addresses for illustration.

20

30

40

50

60

01020

3040506070

N u m b e r o f t r a n s i s t o r s i n m i l l i o n s

Months since Nov.1997

Real routing table

Scatter ARAM Figure 7:The memory used by the multibit trie by the real routing table,scatter and ARAM .

?t on a chip.If designers can provision transistor counts for storing pre ?xes,they can allocate the remaining transistors on the chip for additional fast-path packet processing functionality.

TCAM product offerings use between 14and 16transistors to store one bit of data (called a cell).To store an IP pre ?x,TCAMs will need 32bits,or between 448and 512transistors.By contrast,six transistors are used to store one bit in SRAM.Therefore,we calculate the number of transistors taken by a multibit trie solution by multiplying the number of bits by six.For the example we dis-cussed in Figures 5and 6,a 14-transistor per cell TCAM would have used at least 224transistors.The Tree Bitmap scheme would have required 270(6×45)or 438(6×73)transistors respectively.Multibit trie algorithms also require a ?xed overhead of logic for implementing the algorithm,and for the overhead of memory al-location and deallocation.From discussions with a vendor who has implemented a fully pipelined algorithmic solution,20mil-lion transistors appears to be a fairly conservative estimate for the lookup logic 9.This overhead gets amortized over the number of pre ?xes.Clearly,TCAMs do not require this extra overhead as their lookup logic is (effectively)distributed in each bit.

3.3Applying ARAM to Evaluate Scalability

In Section 2.3we found that by simply varying the number of allocations while keeping the other parameters ?xed,we obtained a good match with the routing table’s pre ?x length distribution,depth distribution and trie density.As another measure of how well ARAM matches current routing tables,Figure 7compares the num-ber of transistors that would be required,using the Tree Bitmap al-gorithm,for ARAM ’s routing tables,the actual routing table,and those produced by the scatter model.Note the close match between ARAM ’s tables with the actual routing tables over the entire dura-tion of our measurement period.By contrast,the scatter model is signi ?cantly inaccurate.

We now embark upon our scaling comparison of TCAMs and multibit tries.For our baseline comparisons,we ?x all ARAM pa-rameters to be those we used to match existing routing tables,and only vary the number of allocations N .The transistor counts used for these larger tables are shown in Figure 8.The number of tran-sistors taken by TCAMs are shown as straight lines:CAMs-i rep-resents an i transistor per cell TCAM.We see that unless TCAM 9

This overhead is for a programmable processor and 32pipeline stages.A much smaller overhead is reported in [3].Our overhead of 20million transistors is thus very conservative.Smaller over-heads only slant the comparison further in favor of SRAM-based algorithmic solutions.

501001502002500

100200300400500600N u m b e r o f t r a n s i s t o r s i n m i l l i o n s

Routing table size: Number of prefixes in thousands

ARAM CAMs-8CAMs-14

Figure 8:The size of the routing table is increased by increasing the number of allocations (other parameters are kept constant).We see that unless TCAMs use 8transistors per cell or less,multibit tries would use lesser transistors than TCAMs.

2.88

2.96

3.04

3.12

3.2

01020

3040506070

N u m b e r o f u n i b i t t r i e n o d e s p e r p r e f i x

Months since Nov. 1997

Figure 9:The density of the trie has been increasing with time.

technology advances to the point where it becomes feasible to use 8or fewer transistors per cell,multibit tries will scale better.

So far we have evaluated scaling in transistor counts based on extrapolating current routing practices as embodied by the ARAM parameters required to match current routing tables.It is worth ask-ing what happens to these projections if routing practices change and the parameters deviate from the values we chose.

If routing practices change in the direction of more pre ?x spawn-ing (caused,for example,by signi ?cant increase of load-sharing),we claim that the multibit trie algorithm will do even better.More pre ?xes will ?t in the same multibit trie node,thereby amortizing the overhead of ?ags and bitmaps even better.Figure 10shows that as C spawn increases,the memory taken by the multibit trie algorithm increases slower than the size of the routing table.Oth-ers [37]in the community have been observing,for some time now,the increase in multihoming practice.Although we used just one set of values for the parameters—except for the number of alloca-tions N —to approximately match all past routing tables,we have observed that the C spawn in real routing tables has been steadily increasing,although almost imperceptibly.This is evidenced by the decrease in unibit trie density over time from 3.2to 2.9or so,Figure 9;that is,pre ?xes have become more “clustered”with time.A similar behavior is also seen in Figure 11;when F split increases,there are more load shared pre ?xes at depth zero of the routing ta-ble.Varying C split shows a similar scaling [4].In general,load shared pre ?xes are clustered together.

On the other hand,with changes in some routing practices,com-

501001502002500

100200300400500

N u m b e r o f t r a n s i s t o r s i n m i l l i o n s

Routing table size: Number of prefixes in thousands

ARAM CAMs-8CAMs-14

Figure 10:The size of the routing table is increased by increas-ing C spawn (other parameters are kept constant).We see that multibit tries handle this case very well.

050

100150

200

100

200300400

N u m b e r o f t r a n s i s t o r s i n m i l l i o n s

Routing table size: Number of prefixes in thousands

ARAM CAMs-8

CAMs-14

Figure 11:The size of the routing table is increased by increas-ing F split (other parameters are kept constant).We see that multibit tries scale better in this case.This is because load shared pre ?xes are clustered together -more than multihom-ing assignments.

pressibility of multibit tries is not improved with increasing table size.For example,increasing F spawn shows a linear scaling of the multibit trie algorithm [4].

An interesting aspect of varying C spawn is not apparent in Fig-ure 10,but does become clearer in Figure 12.Notice that as C spawn increases,the shape of the tree,as captured by the unibit trie den-sity,changes non-monotonically.In particular,we see a peak in the number of unibit nodes per pre ?x around C spawn =0.1.This non-monotonicity allows us to compute the worst-case number of transistors for a given number of allocations.To do this,we use a value of C spawn of nearly 0.1,and set F spawn to 1and F split to 0.For the same number of allocations which contribute to the present day routing table,we found that the worst case parameters above generated 85,530pre ?xes which required 541transistors per pre-?x in the multibit trie algorithm.However the nominal parameters produced a routing table with 77,817pre ?xes which required 206transistors per pre ?x.The intuition is that the value of C spawn is low enough not to allow the cost of a multibit node to be amortized over many pre ?xes,but is high enough to ensure that all depth zero pre ?xes spawn.Since TCAMs require 448-512transistors per pre-?x,we can say that TCAMs would have scaled better (at least in the near term)if there had been no load sharing (and other forms of splitting)and all allocations spawned multihoming assignments.

We say near term because as the number of allocations increases, the density of the trie increases enough to allow the multibit trie to better TCAMs in the number of transistors.For this speci?c set of parameters,multibit tries would overtake TCAMs when the number of allocations grows to thrice their present number(graph not shown).

4.RELATED WORK

In this section,we brie?y survey the literature on IPv4address

lookup algorithms and on address allocation and routing growth. IPv4address lookup techniques have received a fair amount of attention in recent years[36,1,2,11,12],and we do not attempt to be exhaustive in our survey of this sub-?eld.However,broadly speaking,there are two classes of lookup techniques:algorithmic approaches,exempli?ed by[12],attempt to cleverly compress for-warding tables compactly into memory,while hardware approaches rely on(ternary)content-addressable memories(CAMs)to perform fast lookup.We have described these schemes in Section3.1. Most work to date has evaluated their schemes on current routing tables.However,as we have just shown,the memory requirements of some of these lookup schemes can be sensitive to the relationship between pre?xes.To our knowledge,no work has examined how these algorithms scale with routing table size.

However,to capture pre?x relationships,we need to be able to generate realistic routing tables.One closely related piece of work, RTG[28],has attempted to generate routing tables for the purpose of generating realistic BGP updates.RTG does not attempt to ex-plicitly model pre?x relationships in the BGP routing tables that it generates.Rather,it takes an empirical approach:given a current routing table,it extracts some statistics from the table,and attempts to generate tables with comparable statistics.Furthermore,to ex-trapolate routing table sizes to an order of magnitude larger than today,it is necessary to explicitly(if only approximately)capture how allocation and routing practice affects routing table entries. This enables an examination of how quantitative deviations from current routing practice affect IPv4lookup algorithms.ARAM,un-like RTG,explicitly models address allocation at the RIR level,and contains parameters that model multihoming and traf?c engineer-ing.

There exists a body of work that has measured or is examining routing practice.For example,Huston[37]and Gao et al.[10], have measured the prevalence and time evolution of routing prac-tices but have not attempted to model these in order to generate routing tables.More recently,Xu et al.[46]have analyzed address allocation data and have correlated address allocations with routing table growth.However,they do not attempt to model,as we do,the structure of routing tables.

The IETF’s Ptomaine working group[38]seeks to reduce the number of pre?xes in the routing table thereby reducing load on routing.Various ideas being examined include techniques to con-trol route propagation by using BGP community attributes[39,40],?ltering routes on RIR allocation boundaries[41],or multihoming techniques that can reduce routing load[42].ARAM is comple-mentary to these efforts,in that it does not prescribe measures that would reduce routing table sizes,but could be used to describe the shape of the resulting routing tables after changes in routing prac-tice.

Basu and Narlikar[44]seek to reduce the effect of updates on throughput of lookups in a pipeline.They do this by balancing memory among the stages of a pipeline.To make a tradeoff be-tween the speed of a?xed stride multibit trie dynamic program-ming algorithm and the ef?ciency of a variable stride multibit trie, they selectively increase strides to cover contiguous/24s.As we

2

2.5

3

3.5

00.20.40.60.81

N

u

m

b

e

r

o

f

u

n

i

b

i

t

n

o

d

e

s

p

e

r

p

r

e

f

i

x

C spawn

Baseline (Nov.2002 routing table)

ARAM

Figure12:This graph shows the number of unibit trie nodes per pre?x as C spawn is increased from0to1(other parameters are kept constant).We see that the unibit trie density is lowest when C spawn is around0.1.This helps to calculate the worst case storage.The baseline is the unibit trie density for the Nov. 2002routing table.

have seen this contiguity arises mainly due to load sharing. Finally,Kohler et al.[43]examine the structure and distribution of IP addresses found in traf?c traces,and point out that the oc-currence of IP addresses in Internet traf?c is multifractal.This is somewhat orthogonal to our work,which looks at IP address allo-cation and pre?x routing,rather than the traf?c levels originated by, or destined to IP addresses.

5.CONCLUSIONS

This paper describes a model of the structure of Internet routing tables.ARAM is a parameterizable,parsimonious,accurate and predictive model of routing table growth.It qualitatively captures the processes by which address blocks are allocated in the Internet, and those by which they appear in routing tables.ARAM holds intrinsic interest in enabling the study of routing table growth in the abstract(i.e.,independent of any particular lookup implemen-tation).It also applies to the evaluation of lookup schemes,right down to the transistor level.

Using such an evaluation,we?nd that,to a?rst order of approx-imation,multibit tries scale better than TCAMs with increasing routing table sizes.Furthermore,we?nd that the disparity between multibit tries and TCAMs increases with increased multihoming and load-balancing.These results have interesting implications for the choice of lookup technologies in routers.Of course,our conclu-sions can be negated by CAM designs which have fewer transistors per cell.There are rumors of such designs,but it unclear as to whether there is substance behind these rumors.

As an aside,in deriving these results our paper spans several lev-els of abstraction.Starting with the way addresses are allocated, incorporating some of the mechanisms by which ISPs advertise routing information,our paper ends with transistor level models that quantify the cost of hardware implementation.

As routing practices evolve,it will be interesting to see how accurately ARAM continues to model the routing table.For ex-ample,an increasingly popular routing practice is a VPN(Virtual Private Network)service provided by a backbone ISP.To provide this service,ISPs need to keep and advertise separate routes to each of the endpoints of the VPN.If this practice becomes signi?-cantly more prevalent than it is today,it may be necessary to tweak ARAM’s splitting and spawning rules to better match routing prac-tice.VPN pre?xes could perhaps be modeled by more clustered

splitting/spawning rules.VPNs can be a potential reason for pre?x tables to grow more than a million.Furthermore,it will be inter-esting to track the evolution of allocation and routing practices for IPv6,and to study whether ARAM can be extended to model IPv6 routing tables.

Finally,we do not expect ARAM to be the last word in routing table modeling;as more accurate data becomes available,it may be possible to better infer routing practice and therefore design more accurate models than ARAM.

6.ACKNOWLEDGEMENTS

We thank the many people who answered our questions on ad-dress allocation and routing practices.In alphabetical order,they are,Cengiz Alaettinoglu,Buddy Bagga,Adam Bechtel,Randy Bush, Peter Clark,John Crain,Daniel Golding,Ejay Hire,Lee Howard, Richard Jimmerson,Glen Larwill,Herb Leong,Mike Loevner,Bill Manning,Barry Margolin,Christopher Morrow,Alec Peterson,Mark Prior,Gerard Ross,David Schwartz,James Sybert,Son Tran,Leo Vegoda and Will Yardley.We also thank Will Eatherton for feed-back on an earlier version of the paper.Harsha Narayan and George Varghese were funded by NSF grant ANI-0074004.Ramesh Govin-dan was partially funded by the NSF under grant number ANI-0112649.

7.REFERENCES

[1]V.Srinivasan,G.Varghese.Fast Address Lookups Using Controlled

Pre?x Expansion.ACM Transactions on Computer Systems,V olume

19,Number4,November2001.

[2]M.Waldvogel,G.Varghese,J.Turner,B.Plattner.Scalable

High-Speed Pre?x Matching.ACM Transactions on Computer

Systems,V olume17,Issue1,February1999.

[3]David E.Taylor,Jonathan S.Turner,John W.Lockwood,Todd S.

Sproull,David B.Parlour.Scalable IP Lookup for Internet Routers.

IEEE Journal on Selected Areas in Communications,V olume21,

Number4,May2003.

[4]H.Narayan,https://www.wendangku.net/doc/a512723185.html,indan,G.Varghese.The Impact of Address

Allocation and Routing on the Structure and Implementation of

Routing Tables.University of California,San Diego Tech Report CS

2003-0749.

[5] E.Zegura,K.Calvert,S.Bhattacharjee.How to Model an

Internetwork.Proceedings of INFOCOM1996.

[6]V.Paxson.End-to-End Internet Packet Dynamics.Proceedings of

SIGCOMM1997.

[7] https://www.wendangku.net/doc/a512723185.html,bovitz,A.Ahuja,A.Bose,F.Jahanian.Delayed Internet

Routing Convergence.Proceedings of SIGCOMM2000.

[8] https://www.wendangku.net/doc/a512723185.html,bovitz,R.Malan,F.Jahanian.Origins of Internet Routing

Instability.Proceedings of INFOCOM1999.

[9]https://www.wendangku.net/doc/a512723185.html,.

[10]T.Bu,L.Gao and D.Towsley.On Characterizing Routing table

growth.Proceedings of GlobalInternet2002.

[11]M.Degermark,A.Brodnik,S.Pink.Small Forwarding Table for Fast

Routing Lookups.Proceedings of SIGCOMM1997.

[12]W.Eatherton et.al.Tree Bitmap:Hardware/Software IP Lookups

with Incremental Updates.Available at

https://www.wendangku.net/doc/a512723185.html,/sigcomm-withnames.PDF

[13]IANA/8delegations.

https://www.wendangku.net/doc/a512723185.html,/assignments/ipv4-address-space [14]Policies for IPv4address space management in the Asia Paci?c

https://www.wendangku.net/doc/a512723185.html,/docs/policy/add-manage-

policy.html

[15]ARIN https://www.wendangku.net/doc/a512723185.html,/policy/index.html

[16]M.Khne,N.Nimpuno,P.Rendek,S.Wilmot.IPv4Address

Allocation and Assignment Policies in the RIPE NCC Service Region.

https://www.wendangku.net/doc/a512723185.html,/docs/ipv4-policies.html

[17]Daniel Golding.Private Communication.

[18]Mark Prior.Private Communication.[19]FreeIPDB-The Next Generation IP Database.

https://www.wendangku.net/doc/a512723185.html,

[20]Northstar IP Management Tool.

https://www.wendangku.net/doc/a512723185.html,/NorthStar

[21]J.J.Ramsden and Gy.Kiss-Haypal,Company Size Distribution in

Different Countries,Physica A,vol277,pp.220-227,2000.

[22]James Sybert.Private Communication.

[23]Mike Loevner.Private Communication.

[24]ARIN Allocations.

ftp://https://www.wendangku.net/doc/a512723185.html,/pub/stats/arin/

[25]APNIC allocations.

ftp://https://www.wendangku.net/doc/a512723185.html,/pub/stats/apnic/

[26]RIPE Allocations.ftp://https://www.wendangku.net/doc/a512723185.html,/ripe/stats/

[27]University of Oregon Route Views Project

https://www.wendangku.net/doc/a512723185.html,

[28]O.Maennel and A.Feldmann.Realistic BGP Traf?c for Test Labs.

Proceedings of SIGCOMM2002.

[29]Richard Jimmerson.Private Communication.

[30]Son Tran.Private Communication.

[31]Leo Vegoda.Private Communication.

[32]Rati?ed Policy2001-2:Reassignments to multihomed downstream

customers https://www.wendangku.net/doc/a512723185.html,/policy/20012.html

[33]Gerard Ross.Private Communication.

[34] A.Lord.Proposal for IPv4allocations by LIRs to ISPs.APNIC Open

Policy Meeting,September2002.

https://www.wendangku.net/doc/a512723185.html,/meetings/14/sigs/policy/docs/addrpol-out-apnic-sub-alloc.pdf

[35] A.Lord.Downstream Allocations by LIRs:A proposal.RIPE43

Meeting,September2002.

https://www.wendangku.net/doc/a512723185.html,/ripe/meetings/archive/ripe-

43/index.html

[36]M.Ruiz-Sanchez,E.Biersack,W.Dabbous.Survey and Taxonomy of

IP Lookup Algorithms.IEEE Network.V ol.15.Issue2.2001.

[37]G.Huston.Analyzing the Internet’s BGP Routing Table.Internet

Protocol Journal.V olume4,Number1,March2001.

[38]Pre?x Taxonomy Ongoing Measurement&Inter Network

Experiment(ptomaine).

https://www.wendangku.net/doc/a512723185.html,/html.charters/ptomaine-

charter.html

[39]G.Huston.NOPEER community for BGP scope control.

draft-ietf-ptomaine-nopeer-00.txt April2002.

[40]O.Bonaventure,S.DeCnodder,J.Haas,B.Quoitin,R.White.

Controlling the redistribution of BGP routes.

draft-ietf-ptomaine-bgp-redistribution-01.txt.August2002

[41]S.Bellovin,R.Bush,T.Grif?n,J.Rexford.Slowing Routing Table

Growth by Filtering Based on Address Allocation Policies.

https://www.wendangku.net/doc/a512723185.html,/jrex/papers/filter.pdf

[42]T.Bates and Y.Rekhter.Scalable Support for Multi-homed

Multi-provider Connectivity.RFC2260.

[43] E.Kohler,J.Li,V.Paxson,and S.Shenker.Observed structure of

addresses in IP traf?c.Proceedings of2nd Internet Measurement

Workshop,November2002.

[44]Anindya Basu and Girija Narlikar.Fast Incremental Updates for

Pipelined Forwarding Engines.Proceedings of Infocom2003.

[45] D.Shah and P.Gupta.Fast Updates on Ternary CAMs for Packet

Lookups and Classi?cation.Hot Interconnects8.

[46]Zhiguo Xu,Xiaoqiao Meng,Cathy Wittbrodt,Songwu Lu,Lixia

Zhang.Address Allocation and the Evolution of the BGP routing

table.Technical Report CSD-TR03009,UCLA Computer Science

Depratment,2003.https://www.wendangku.net/doc/a512723185.html,/wing/pdfdocs/address tr.ps

相关文档