当前位置：文档库 › PRISM Precision-Integrated Scalable Monitoring (extended

PRISM Precision-Integrated Scalable Monitoring (extended

PRISM:PRecision-Integrated Scalable Monitoring

Navendu Jain,Dmitry Kit,Prince Mahajan,Praveen Yalagandula?,Mike Dahlin,and Yin Zhang Department of Computer Sciences?Hewlett-Packard Labs

University of Texas at Austin Palo Alto,CA

Abstract

This paper describes PRISM,a scalable monitoring ser-vice that makes imprecision a?rst-class abstraction for its scalable DHT-based aggregation service.Exposing imprecision is essential for both correctness in the face of network and node failures and scalability to large systems.PRISM introduces the notion of conditioned consistency that quanti?es imprecision along a three-dimensional vector:arithmetic imprecision(AI)bounds numeric inaccuracy,temporal imprecision(TI)bounds update delays,and network imprecision(NI)bounds un-certainty due to network and node failures.AI and TI balance precision against monitoring overhead for scala-bility while NI addresses the fundamental challenge of providing consistency guarantees despite failures in a large distributed system.Our implementation addresses the challenge of providing these metrics while scaling to a large numbers of nodes and attributes.By introduc-ing a10%AI,PRISM’s PlanetLab monitoring service, PrMon,can reduce network overheads by an order of magnitude compared to the currently-used CoMon ser-vice.And,by using NI metrics to automatically select the best of four redundant aggregation results,we can reduce the observed worst-case inaccuracy by nearly a factor of?ve.

1Introduction

This paper describes how to make imprecision a?rst class abstraction for large-scale system monitoring. Scalable system monitoring is a fundamental abstrac-tion for large-scale networked systems,and it can serve as a basic building block for new applications such as network monitoring and management[5,15,41],re-source location[16,40],ef?cient multicast[36],sensor networks[16,40],resource management[40],and band-width provisioning[9].Recent work on aggregation[16, 30,36,40]and DHTs[27–29,33,46]provides important enabling technology for constructing monitoring systems that are self-organizing,scalable,and robust[36,40]. However,to realize this vision of scalable system monitoring,the underlying monitoring infrastructure must expose imprecision in a controlled manner for two reasons.

First,correct interpretation of data requires explicitly exposing the imprecision introduced by sensor inaccu-racy and node/network delays and failures.A funda-mental and unique challenge in any hierarchical aggre-gation system is the failure ampli?cation effect:if a non-leaf node fails,an entire subtree rooted at that node is affected.For example,failure of a level-3node in a degree-8aggregation tree can cut off updates from512 leaf nodes.As a result,a hierarchical monitoring service that does not expose imprecision risks delivering arbi-trarily incorrect results.

Second,introducing controlled amounts of impreci-sion can reduce monitoring load by an order of magni-tude or more for some applications.Studies suggest[20, 22,32,36,44]that real-world applications often can tol-erate some inaccuracy as long as the maximum error is bounded and small amounts of imprecision can pro-vide signi?cant bandwidth reductions.This enables new classes of precision-aware monitoring applications that can tradeoff between imprecision and resource usage. To meet these needs,we have developed PRecision-Integrated Scalable Monitoring(PRISM).The PRISM system makes two contributions:First,it de?nes a novel conditioned consistency metric that quanti?es impre-cision along a three-dimensional vector:(Arithmetic, Temporal,Network).

?Arithmetic imprecision(AI)bounds the numerical in-consistency between the reported value of an aggre-gate relative to the true value.

?Temporal imprecision(TI)places a real-time bound on the delay from when an event/update occurs until it is reported.

?Network imprecision(NI)bounds the inaccuracy in-troduced by failed/slow nodes,failed/slow network links,and aggregation tree recon?gurations. Although each of the three dimensions is individually useful,the combination is vital because it enables con-ditioned consistency:the arithmetic and temporal guar-antees are calculated optimistically,assuming that the network is“well behaved”(e.g.,no node failures,slow links,or tree recon?gurations have affected the results). The NI metric then quali?es AI and TI metrics by quanti-fying how“well behaved”the network actually has been during the period when these metrics are calculated. Second,it provides a scalable implementation of each of these three metrics for DHT-based aggregation sys-tems.Scalability to large numbers of attributes and nodes is vital because network monitoring applications may track tens of thousands of attributes across hundreds or thousands of nodes[34,36,40].

?For AI,the challenge is distributing an imprecision budget across nodes based on each attribute’s work-load.PRISM employs a hierarchical self-tuning algo-

rithm that directs imprecision slack to where it is most needed and that tries to ensure that the adaptation cost is smaller than the bene?ts of doing so.

?For TI,the challenge is to maximize the number of updates batched together and to minimize the TI in-troduced by this batching.To accomplish this goal, PRISM pipelines the available slack across levels of the aggregation hierarchy.

?For NI,the challenge is to scalably detect and report failed/slow nodes/links which requires active probing.

A straightforward algorithm that detects and aggre-gates NI values along each aggregation tree in an n-node system can lead to O(n)message load at each node in every probing period.By leveraging the ob-servation that the forest of aggregation trees forms a butter?y network,PRISM introduces a novel dual-tree pre?x aggregation abstraction that re-uses work done by subtrees and thereby reduces the per-node cost to O(log n)messages every probing period.For a1000-node system,this implies three orders of magnitude reduction in message cost compared to the naive algo-rithm above.

Experience with a distributed heavy hitter detection application and a PrMon monitoring service for Plan-etLab built on PRISM illustrate how explicitly manag-ing imprecision can qualitatively enhance a monitoring service.The most obvious bene?t is improved scalabil-ity:for both applications,small amounts of imprecision drastically reduce monitoring load or allow more exten-sive monitoring for a given load budget.For example, in PrMon,a10%AI allows us to reduce network load by an order of magnitude compared to the widely used CoMon[6]service.A subtler but perhaps more impor-tant bene?t is the ability to quantify and improve con?-dence in the accuracy of outputs by addressing network imprecision and the ampli?cation effect.For example, by using NI metrics to automatically select the best of four redundant aggregation results,we can reduce the ob-served worst-case inaccuracy by nearly a factor of?ve. The key contributions of this paper are as follows. First,we present PRISM,the?rst DHT-based system that enables imprecision for scalable aggregation by intro-ducing a new conditioned consistency metric that bounds the arithmetic,temporal,and network imprecision.Sec-ond,we provide scalable and ef?cient implementation of each precision metric via(1)self-tuning of AI budgets, (2)pipelining of TI delays,and(3)dual-tree pre?x ag-gregation for NI.Third,our evaluation demonstrates that imprecision is vital for enabling scalable aggregation:a system that ignores imprecision can silently report arbi-trarily incorrect results and a system that fails to exploit imprecision can impose unacceptable

overheads.

Physical Nodes (Leaf Sensors)

Fig.1:The aggregation tree for key000in an eight node sys-tem.Also shown are the aggregate values for a simple SUM() aggregation function.

2Background

PRISM builds on two recent and ongoing research ef-forts for scalable monitoring:aggregation[36]and DHT-based aggregation[40].

Aggregation.Aggregation is a fundamental abstrac-tion for scalable monitoring[10,16,27,36,40]because it allows applications to access summary views of global information and detailed views of rare events and nearby information.

The aggregation abstraction in PRISM is de?ned across a tree spanning all nodes in the system.As Fig-ure1illustrates,each physical node in the system is a leaf and each subtree represents a logical group of nodes.

Note that logical groups can correspond to administra-tive domains(e.g.,department or university)or groups of nodes within a domain(e.g.,a/28subnet with14hosts on a LAN in the CS department).An internal non-leaf node,which we call a virtual node,is simulated by one or more physical nodes at the leaves of the subtree rooted at the virtual node.

The tree-based aggregation in the PRISM framework is de?ned in terms of an aggregation function which is installed at all the nodes in the tree.Each leaf node (physical sensor)inserts or modi?es its local value for an attribute de?ned as an{attribute type,attribute name} pair which is recursively aggregated up the tree.For each level-i subtree T i in the aggregation tree,PRISM de?nes an aggregate value V i,attr for each attribute as follows:For a(physical)leaf node T0at level0,V0,attr is the locally stored value for the attribute or NULL if no matching tuple exists.Then the aggregate value for

a level-i subtree T i is the aggregation function for the

attribute type,A type,computed across the aggregate val-ues of each of T i’s k children.Figure1,for example, illustrates the computation of a simple SUM aggregate.

DHT-based aggregation.To achieve scalability for Internet-scale systems,PRISM faces the fundamental challenge of computing aggregates for thousands to millions of attributes across hundreds or thousands of nodes[36,40].Later in this section,we present an ex-ample of detecting heavy hitters on a distributed sys-2

tem where PRISM needs to track millions of attributes. To address this scalability challenge,PRISM leverages DHTs[27–29,33,46]to construct a forest of aggregation trees and maps different attributes to different trees[40]. DHT systems assign a long(e.g.,160bits),random ID to each node and de?ne a routing algorithm to send a re-quest for key k to a node root k such that the union of paths from all nodes forms a tree DHTtree k rooted at the node root k.By aggregating an attribute with key k along the aggregation tree corresponding to DHTtree k,differ-ent attributes are load balanced across different trees. Example Applications Aggregation is a building block for many distributed applications such as network management[41],service placement[12],sensor moni-toring and control[20],multicast tree construction[36], and naming and request routing[7].In this paper,we focus on two case-study examples:a distributed heavy hitter detection and PrMon,a distributed monitoring ser-vice for PlanetLab modelled on CoMon[6].

Heavy Hitter detection:Our?rst application is identifying heavy hitters on a distributed system i.e.,the top10IPs that account for a signi?cant fraction of total incoming traf?c in a measurement interval(e.g.,10min-utes)[9].The key challenge for this distributed query is scalability for aggregating per-?ow statistics for millions of concurrent?ows in real-time;the Abilene[1]traces used in our experiments include up to3.4million?ows per hour.

To scalably compute the global heavy hitters list, we chain two aggregations where the results from the ?rst aggregation feed into the second aggregation.In the?rst aggregation,PRISM calculates the total band-width consumed by each sender to all nodes in the sys-tem using SUM as the aggregation function and{HH-Step1,senderIP}as the key.For example,a node writes the tuple({HH-Step1,128.82.121.7},700KB)indicat-ing that700KB of data was received from the node 128.82.121.7during the last time window.In the sec-ond step,we feed these aggregated total bandwidths for each sender IP address into another aggregation tree for selecting TOP-10heavy hitters.To achieve this,we use SELECT-TOP-10as the aggregation func-tion and use{HH-Step2,TOP-10}as the key.For ex-ample,the root of the?rst aggregation tree for{HH-Step1,128.82.121.7}which has computed the global ag-gregate value of6200KB as the total bandwidth con-sumed by128.82.121.7,inputs the tuple({HH-Step2, TOP-10},{128.82.121.7,6200KB}).At the end of this chained aggregation,the root of the second aggregation tree has the top10IP addresses that send most traf?c to the nodes in the system.

Real-time Network Monitoring:The second appli-cation is our PrMon monitoring service that is represen-

tative of monitoring Internet-scale distributed systems such as PlanetLab[26]and Grid systems[35]that pro-vide open platforms for developing,deploying,and host-ing global-scale services.For instance,to manage a wide array of user services running on the PlanetLab testbed, the system administrators need a global view of the sys-tem to identify problematic experiments(slices in Planet-Lab terminology)to identify,for example,any slice con-suming more than500GB of memory across all nodes on which it is running.Similarly,users require system state information to query for“lightly-loaded”nodes for deploying new experiments or to track the resource con-sumption of their running experiments.

To provide such information in a scalable way and in real-time,PRISM computes the per-slice aggregates for each resource attribute(e.g.,CPU,TX1,etc.)along dif-ferent aggregation trees.This aggregate usage of each slice across all PlanetLab nodes for a given resource attribute(e.g.,CPU)is then input to a per-resource SELECT-TOP-100aggregate(e.g.,{SELECT-TOP-100, CPU})to compute the list of top-100slices in terms of consumption of the resource.Although there are existing central monitoring services,in Section5we will show that PRISM can monitor a large number of attributes at much?ner time scales while incurring signi?cantly lower network costs.

3AI and TI

PRISM quanti?es imprecision along a three-dimensional vector:(Arithmetic,Temporal,Network).We now de-scribe how we enforce bounds on arithmetic imprecision (AI),which limits the numeric difference between a re-ported value of an attribute and its true value[23,45], and temporal imprecision(TI),which limits the delay from when an update is input at a leaf sensor until the effects of the update are re?ected in the root aggregate.

These aspects of imprecision provide means to(a)ex-pose inherent imprecision in a monitoring system stem-ming from sensor inaccuracy and update propagation de-lays and(b)reduce system load by introducing additional ?ltering and batching on update propagation.

The implementations of AI and TI are simple because they can assume that aggregation trees never recon?gure and that nodes and network paths never fail and are never slow.The network imprecision(NI)metric described in Section4addresses these challenging real-world issues.

3.1Arithmetic Imprecision(AI)

We?rst describe the basic mechanism for enforcing AI for each aggregation subtree in the system.Then we de-scribe how our system uses a self-tuning algorithm to address the policy question of distributing an AI budget across subtrees to minimize system load.

3.1.1Mechanism

To enforce AI,each aggregation subtree T for an at-tribute has an error budget δT which de?nes the maxi-mum inaccuracy of any result the subtree will report to its parent for that attribute.The root of each subtree di-vides this error budget among itself δself and its children δc ,and the children recursively do the same.Here we present the AI mechanism for the SUM aggregate;other standard aggregation functions (e.g.,MAX,MIN,A VG)are described in the appendix.

This arrangement reduces system load by ?ltering small updates that fall within the range of values “cached”by a subtree’s parent.In particular,after a node A with error budget δT reports a range [V min ,V max ]for an attribute value to its parent (where V max =V min +δT ),if the node A receives an update from a child,the node A can skip updating its parent as long as it can en-sure that the true value of the attribute for the subtree lies between V min and V max ,i.e.,if

V min ≤ c ∈children V c

min

V max ≥ c ∈children V c

max

(1)where V c min and V c

max denote the most recent update re-ceived from child c .

Notice the trade-off in splitting δT between δself and δc .Large values of δc allow children to ?lter updates be-fore they reach a node.Conversely,by setting δself >0,a node can set V min < V c

min ,set V max > V c

max ,or both to avoid further propagating some updates it re-ceives from its children.

PRISM maintains per-attribute δvalues so that differ-ent attributes with different error requirements and dif-ferent update patterns can use different δbudgets in dif-ferent subtrees.PRISM implements this mechanism by de?ning a per-attribute-type distribution function that is analogous to the per-attribute-type aggregation function.Just as an attribute type’s aggregation function speci?es how aggregate values are aggregated from children,an attribute type’s distribution value speci?es how δbudgets are distributed among children and δself .

3.1.2Policies

Given the above mechanisms,to guarantee that the to-tal aggregation error does not exceed the root error bud-get δroot for an attribute,we just need to ensure that the following two conditions hold at the root node of every subtree T .

δT ≥δself +

c ∈children δc

V max ≤V min +δT (2)

Given these constraints,we still have plenty of free-dom to (i)set δroot to an appropriate value for each at-tribute,(ii)compute V min and V max when updating a parent,and (iii)split δinto δself and δc .Below we

present policies that exploit such freedom to optimize the precision v.performance trade-off.

Setting δroot .Note that the aggregation queries can set the root error budget δroot to any non-negative value.For some applications,an absolute constant value may be known a priori (e.g.,count the number of connections per second ±10at port 1433.)For other applications,it may be appropriate to set the tolerance based on measured be-havior of the aggregate in question (e.g.,set δroot for an attribute to be at most 10%of the maximum value ob-served)or the measurements of a set of aggregates (e.g.,in our heavy hitter application,set δroot for each ?ow to be at most 1%of the bandwidth of the largest ?ow mea-sured in the system).Our algorithm supports all of these approaches by allowing new absolute δroot values to be introduced at any time,and we have prototyped systems that use each of these three policies.

Computing [V min ,V max ].When either c V c

min or c V c

max goes outside of the last [V min ,V max ]that was reported to the parent,a node needs to report a new range to its parent.Given a δself budget at an internal node,we have some ?exibility on how to center the [V min ,V max ]range.Our approach is to adopt a per-aggregation-function range policy that reports V min

=( c V c

min )?bias ?δself and V max =( c V c max )+(1?bias )?δself to the parent.The bias parameter can be set as follows:Set

?bias ≈0.5if inputs expected to be roughly stationary ?bias ≈0if inputs expected to be generally increasing ?bias ≈1if inputs expected to be generally decreasing For example,suppose a node with total δT of 10and

δself of 3has two children reporting ([V c

min

,V c max ])of [1,2]and [2,8],respectively,and reports [0,10]to its parent.Then,the ?rst child reports a new range [10,11],so the node must report to its parent a range that includes [12,19].If bias =0.5,then report to parent [10.5,20.5]to ?lter out small deviation around the current position.Conversely,if bias =0,report [12,22]to ?lter out the maximal number of updates of increasing values.Self-tuning error budgets.The ?nal policy question is how to divide a given error budget δroot across the nodes in an aggregation tree.

A simple approach is to have a static policy that di-vides the error budget uniformly among all the chil-dren.For example,a node with budget δT could set δself =0.1δand then divide the remaining 0.9δT evenly among its children.Although this approach is simple,it is likely to be inef?cient because different aggregation subtrees may experience different loads.

Algorithm.To make cost/accuracy tradeoffs self-tuning ,PRISM provides an adaptive algorithm by which nodes adjust to changing error budget δT and adapt the 4

balance between δself and δc for each child c .The high-level idea is simple:increase δfor nodes with high load and low δand decrease δfor nodes with low load and high δ.Unfortunately,a naive rebalancing algorithm could easily spend more network messages redistribut-ing δs than it saves by ?ltering updates.This is a partic-ular concern for applications like distributed heavy hitter that monitors a large number of attributes,only a few of which are active enough to be worth optimizing.To ad-dress this challenge PRISM uses a two-step algorithm:1.Estimate optimal distribution of δT among δself and δc .

Each node tracks the number of messages sent to its par-ent per time unit (M self )and the aggregate number of updates per time unit reported by each child c ’s subtree (M c ).Note that M c reports are accumulated by a child until they can be piggy-backed on an update message to its parent.Given this information each node n estimates

the optimal values δopt

v that minimizes the total system load v M opt v ,where M opt v is an estimate of the load

generated by node v under optimal error budget δopt

.In particular,for any v ∈{self }∪child (n )we estimate

δopt

v =δT ?√

M v ?δv v ∈{self }∪child (n )√M v ?δv .(3)which is optimal assuming that load is inversely pro-portional to error budget and which seems a reasonable

heuristic for predicting the impact of small changes.2.Redistribute deltas iff the expected bene?t exceeds the redistribution overhead.

At any time,a node n computes a charge metric for each child subtree c ,which estimates the number of ex-tra messages sent by c due to sub-optimal δ.Charge c =

(T curr ?T ad just )?(M c ?M opt

),where T ad just is the last time δwas adjusted at n .Notice that a subtree’s charge will be large if (a)there is a large load imbalance (e.g.,

M c ?M opt

c is large)or (b)there is a stable,long-lasting imbalance (e.g.,T curr ?T a

d just is large.)

We only send messages to redistribute deltas if doing so is likely to save at least k messages (i.e.,if charge c >k ).To ensure the invariant that δT ≤δself +

c δc ,we make this adjustment in two steps.First,we loan some of the δself budget to the node c that has accumulate

d th

e largest charge by incrementing c ’s budget by min(0.1δc ,

max(0.1δself ,δself -δopt

self )).Second,we replenish δself from the child whose δc is the farthest above delta opt c by

ordering c to reduce δc by min(0.1δc ,δc -delta opt

c ).A node responds to a request from its parent to update δT using a similar approach.

3.2Temporal Imprecision

Temporal imprecision provides a real-time bound on the delay between when an update occurs at a leaf node and

...

Event

level 0

level 1level 2level 3level 4level 0

level 1level 2level 3level 40

TI 0

TI Next TI interval starts here

(a) Send unsynchronized updates every TI/4 seconds.

(b) Send synchronized updates every TI seconds.

.....................Fig.2:For a given TI bound,pipelined delays with synchro-nized clocks (b)allows nodes to send less frequently than un-pipelined delays without synchronized clocks (a).

when it is re?ected in the aggregated result reported by the root.A temporal imprecision of T I seconds guaran-tees that every event that occurred T I or more seconds ago is re?ected in the reported result;events younger than T I may or may not be re?ected.[32].

Temporal imprecision bene?ts monitoring applica-tions in two ways.First,it accounts for inherent net-work and processing delays in the system;given a worst case per-hop cost hop max even immediate propagation provides a temporal guarantee no better than ?hop max where is the maximum number of hops from any leaf to the root of the tree.Note that although Internet round trip times have a very long tail [2,8,24],the network imprecision metric allows us to assume a relatively low hop max (e.g.,10seconds)because if network and pro-cessing times increase beyond this bound,then the net-work imprecision metric re?ects the unexpected delay.Second,explicitly exposing TI provides an opportu-nity to combine multiple updates to improve scalability by reducing processing and network load.If the TI guar-antee for an attribute exceeds the minimum system la-tency i.e.,T I > ?hop max then a node in tree can use the “extra”time to try to accumulate multiple updates from the node’s children before calculating and sending a single update to the parent.

Below we present an optimized mechanism for im-plementing temporal imprecision using pipelined delays based on synchronized clocks.PRISM also provides a fall-back alternative for unsynchronized clocks [18].Pipelined delays.We maximize the opportunity for batching updates by pipelining the available slack delay across levels of the aggregation hierarchy.

Suppose clocks were perfectly synchronized and sup-pose that message transmission and processing were in-stantaneous.Then,as Figure 2(a)illustrates,one option to enforce a bound of T I for an -level tree ( =4in Figure 2)would be for every node to send an update to its parents every T I/ seconds.Alternatively,as Fig-ure 2(b)illustrates,each leaf node could send a batch up-date (combining all its updates in the current T I interval)at time T I ? ? ,all level-1nodes at time T I ?( ?1)? ,and so on for some small .Note that the next T I interval 5

starts after T I? ? time in the current interval effec-tively leading to a batching interval of T I? ? at every level.Thus,by synchronizing update transmission times across levels,we still meet the T I guarantee but increase the batching interval from T I/ to T I? ? .

Of course,a real implementation must account for clock skew,processing delays,and network delays to en-sure that level i holds opens a suf?cient window of time for level i?1’s updates to arrive and be processed.For the leaves,we specify a send interval I and an arbitrary reference time Z such that for the j th interval at time Z+j?I a leaf node sends an update if and only if the value has changed since Z+(j?1)?I.We calculate I as follows:(1)we synchronize the clocks on differ-ent nodes such that the maximum skew between any two nodes is skew max.1(2)We de?ne a“stagger”parame-ter S that bounds the delay for updates to traverse levels, i.e.,S=hop max+2?skew max.Finally,(3)we set I=T I? ?S.Note that the smallest TI that can be provided is T I min= ?S.Also note that for load bal-ancing,different attributes can choose different Z base times.

For internal nodes,we pipeline each level’s timeouts to occur just before its parent’s.Speci?cally,at time Z+j?I+i?S an aggregation node at level i sends an updated aggregate value to its parent if and only if the value of any of its inputs has changed since time Z+(j?1)?I+i?S. This approach ensures the following property:an event at a leaf node at local time X is re?ected at root no later than time X+T I according to the local time at the leaf node.

4Network Imprecision

Network imprecision characterizes the uncertainty intro-duced by node crashes,slow network paths,unreachable nodes,and DHT topology recon?gurations.In particu-lar,if a a subtree is silent over an interval,a aggregation system must distinguish two cases:(1)the subtree has sent no updates because the inputs have not signi?cantly changed versus(2)the inputs have signi?cantly changed but the subtree is unable to transmit its report.This prob-lem is fundamental,and it is an extension of the CAP impossibility result[13,31],but it is made worse by the ampli?cation effect of hierarchical aggregation:if a non-leaf node fails,then the entire subtree rooted at that node can be affected.For example,failure of a level-3node in a degree-8aggregation tree can interrupt updates from 512(83)leaf node sensors.If these issues are not ad-dressed by an aggregation system,the results a monitor-ing system reports may be arbitrarily incorrect.

The key idea of NI is that because no system can guarantee to always provide the“right”answer,it in-1Algorithms in the literature can achieve clock synchronization among nodes to within one millisecond[37].

stead must report the extent to which a calculation could have been disrupted by network and node problems.This information allows applications to?lter out or take ac-tion to correct measurements with unacceptable uncer-tainty.To that end,NI is composed of three metrics,N all, N reachable,and N dup.

?N all is an estimate of the number of nodes that are members of the system.

?N reachable is a lower bound on the number of nodes for which input propagation is guaranteed to meet an attribute’s TI bound.

?N dup provides an upper bound on the number of nodes whose contribution to an attribute may be doubly-counted.Double-counting can occur when recon?g-uration of an aggregation tree’s topology causes a leaf node or virtual internal node to fail-over to a new par-ent while its old parent retains the node’s inputs as soft state until a timeout.

Conditioned Consistency.These three metrics condi-tion the arithmetic and temporal consistency guarantees.

In particular,reading an attribute’s value from the sys-tem returns a tuple[V min,V max,T I,[N all,N reachable, N dup]]that means“The system estimates the value to be between V min and V max.This estimate may omit some inputs that occurred in the last T I seconds and it may also omit some inputs from N all?N reachable of the N all nodes in the system.This estimate may double count inputs from at most N dup nodes.”

Users and applications must evaluate the signi?cance of disruptions that cause N reachable0 in the context of their requirements.For some applica-tions,an aggregate result may be unusable if it omits or duplicates any inputs.Conversely,other applications may be content with best-effort results and may ignore NI completely.Other applications will take a middle ground and be structured to tolerate modest amounts of NI.

In Section5.3,we explore one general and effective strategy:using redundant trees in the DHT to compute an attribute and then using NI to identify the highest-quality result.Other available approaches include constructing aggregates that tolerate common forms of NI(e.g.,the MAX aggregation function is insensitive to N dup>0), taking corrective action during periods of high NI(e.g., actively probing unresponsive machines,triggering on-demand reaggregation[40],considering the recent his-tory of aggregate values,or delaying taking action un-til more conclusive information is gathered),or?agging results as GOOD/MARGINAL/BAD depending on the observed NI.

Challenges.Although monitoring connectiv-ity to nodes to compute the NI metrics appears straightforward—the NI metrics are all conceptually 6

aggregates across the state of the system—in practice two challenges arise.First,the system must cope with recon?guration of dynamically-constructed aggregation trees.Second,the system must scale to large numbers of nodes despite(a)the need for active probing to measure liveness between each parent-child pair and(b)the need to compute distinct NI values for each of the large number of distinct aggregation trees in the underlying DHT forest.

In the rest of this section,we?rst provide a simple algorithm for computing N all and N reachable for a sin-gle,static tree.Then,in Section4.2we explain how PRISM computes N dup in order to account for dynam-ically changing aggregation topologies.Finally,in Sec-tion4.3we describe how to scale the approach to work with the large number of distinct trees constructed by PRISM’s DHT framework.

4.1Single tree,static topology

This section considers calculating N all and N reachable for a single,static-topology aggregation tree.

N all is simply a count of all nodes in the system,and it is easily computed using PRISM’s aggregation abstrac-tion.Each leaf node inserts1to the N all aggregate, which has SUM as its aggregation function.Note that even if a node becomes disconnected from the DHT, its contribution to this aggregate remains cached as soft state by its ancestors for a long timeout T declareDead.

N reachable for a subtree is a count of the number of leaves that have a good path to the root of the subtree where a good path is a path in which no hop takes longer than hop max.Nodes compute N reachable in two steps: 1.Basic aggregation:PRISM creates a SUM aggregate

and each leaf inserts local value of1.The root of the tree then gets count of all nodes.

2.Aggressive pruning:In contrast with the default be-

havior of retaining aggregate values of children as soft state for up to T declareDead,N reachable must immedi-ately change if a connection to a subtree is no longer

a good path.Each internal node pings its child once

every p time units and maintains sendP ingReplied c, the time it sent the last ping for which it has received

a reply from c.A child sends its ping reply only af-

ter sending any messages backlogged in its outbound message queue.Note that p should be smaller than hop max;we use p=10seconds by default(hop max =30seconds).If sendP ingReplied c+hop max< curr time then a node declares child c unreachable: the node removes c’s subtree contribution from the N reachable aggregate and immediately sends the new value up towards the root of the N reachable aggrega-tion tree.Notice that for simplicity this approach is conservative—it declares a child“unreachable”if the round trip time(rather than the one way time)exceeds

hop max.

Nreachable v.TI N reachable characterizes the current topology of the aggregation tree,but past connectivity disruptions could affect attributes with large TI.In partic-ular,because our TI algorithm de?nes a small window of time during which a node must propagate updates to its parents,then any attribute’s subtree that was unreachable over the last T I attr could have been unlucky and missed its window even though the subtrees nodes are currently all counted as reachable.We must either(a)modify the protocol to ensure that such a subtree’s updates are re-?ected in the aggregate so that the promised TI bound is met or(b)we must ensure that N reachable counts such subtrees as unreachable because they may have violated their TI bound.

We take the former approach to avoid having to cal-culate a multitude of N reachable values for different TI bounds.In particular,when a node receives updates from

a child marked unreachable,it knows those updates may

be late and may have missed their window for TI prop-agation.It therefore marks such updates as NODELAY.

When a node receives a NODELAY update,it processes it immediately and propagates the result with the NODE-LAY?ag so that TI delays are temporarily ignored for that attribute.This modi?cation may send extra mes-sages in the(hopefully)uncommon case of a link per-formance failure and recovery,but it ensures that the cur-rent N reachable value counts nodes that are meeting all of their TI contracts.

4.2Dynamic topology

Each virtual node in PRISM caches state from its chil-dren so that when a new input from one child comes in, it can compute new values to pass up using local infor-mation.This information is soft state—a parent discards it if a client is unreachable for a long time.But because reconstructing this state is expensive(there may be tens of thousands of attributes for aggregation functions like “where is the nearest copy of?le foo”[34]),we use long timeouts to avoid spurious garbage collection(e.g.,we use T declareDead≈10minutes in our prototype.)

Notice that when a subtree chooses a new parent,then that subtree’s inputs may still be stored by a former par-ent and thus be counted multiple times in the aggregate.

N dup bounds the number of leaf inputs that might be in-cluded multiple times in an aggregate calculation.

The basic aggregation function for N dup is simple.We keep a count k of the number of leaves in each subtree using the obvious aggregation function.Then,if a sub-tree root spanning k leaf nodes switches to a new parent, that subtree root inserts the value k into the N dup aggre-gate,which has SUM as its aggregation https://www.wendangku.net/doc/29310227.html,ter, when the node is certain suf?cient time has elapsed that its old parent has safely removed its soft state,it updates 7

t_recv

t_send

d_grantLease = t_haveLease ? t_recv

t_haveLease = min_c (t_haveLease[c])

LEASE_RENEW

t_grantLease = max(t_grantLease, t_haveLease)

d_grantLease

t_haveLease[n2] = t_send + (d_grantLease * (1?max_drift))Fig.3:Protocol for a parent to renew a lease on the right to hold as soft state a child’s contribution to an aggregate.

its input of N dup to 0.

Lease aggregation.For correctness,our implementa-tion uses a lease aggregation algorithm that extends the concept of leases [14]to hierarchical aggregation.

Figure 3illustrates the protocol used when a node n 1updates a lease on the right to cache the inputs from a set of descendants rooted at n 2.The algorithm makes use of local clocks at n 1and n 2,but it is not sensitive to skew and tolerates a maximum drift rate of max drift (e.g.,5%).In this protocol,a node maintains t haveLease ,the latest time for which it holds leases for all descen-dants,and t grantLease ,the latest time for which it has granted a lease to its ancestors.The key to the proto-col is that the child n 2extends the lease by a duration d grantLease ,but the child interprets the d grantLease in-terval starting from t recv ,the time it received the renewal request,while the parent interprets the interval starting from t send .As a result,a lease always expires at a parent before expiring at a child regardless of the skew between their clocks [42].

A node that roots a k -leaf subtree that switches to a new parent then contributes k to N dup until t grantLease ,after which it may reset its contribution of N dup to 0because its former parent is guaranteed to have cleared from its soft state all inputs from the node.

To avoid spurious lease expirations,each node renews leases from its descendants once every renew seconds and leaf nodes grant leases of length T declareDead with renew <

Early expiration.PRISM uses early expiration to minimize the scope of disruption when a tree’s topology recon?gures.In particular,the lease aggregation mech-anism ensures the invariant that leases near the root of a tree are shorter than leases near the leaves.As a result,a naive implementation that removes cached soft state ex-actly when a lease expires would exhibit the perverse be-havior illustrated in Figure 4(a):each node from the root to the parent of a failed node will successively expire its problematic child’s state,recalculate its aggregates with-out that child,update its parent,renew its parent’s lease,and then repeatedly receive and propagate updated ag-gregates from its child as the process ripples down the tree.Not only is that process expensive,but it may signif-icantly and unnecessarily perturb values reported at

the

L0L1L2L3Fig.5:The failure of a physical node has different effects on different aggregations depending on which virtual nodes are mapped to the failed physical node.The numbers next to vir-tual nodes show the value of N reachable for each subtree after the failure of physical node 001,which acts as a leaf for one tree but as a level-2subtree root for another.

root for all attributes by removing and re-adding large subtrees of inputs.Furthermore,note that the example in Figure 4is the common case:in a randomly constructed tree,the vast majority of nodes (and failures)are near the leaves.Failing to address this problem would transform the common-case of leaf failures into signi?cant disrup-tions near the root and bring into play the ampli?cation effect.

Early expiration avoids this unwarranted disruption as Figure 4(b)illustrates.A node at level i of the tree dis-cards the state of an unresponsive subtree (maxLevels -i )*d early before its lease expires.Once the node has removed the problematic child’s inputs from the aggre-gates values it has reported to its parent,the node can renew leases to its parent that are no longer limited by the ever-shortening lease held on the problematic child.As the ?gure illustrates,this technique minimizes dis-ruption by allowing a node near the trouble spot to prune the tree,update its ancestors,and resume granting long leases before any ancestor acts.

4.3Scaling to large systems

Scaling NI is a challenge.To scale monitoring to large numbers of nodes and attributes,PRISM constructs a forest of trees using an underyling DHT and then uses different aggregation trees for different attributes.As Figure 5illustrates,a failure affects different trees dif-ferently so we need to calculate NI metrics for each of the n distinct global trees in an n -node system.Making matters worse,as Section 4.1explained,maintaining the NI metrics requires active probing every p seconds along each edge of each tree’s graph.

As a result of these factors,the straightforward al-gorithm for maintaining NI metrics separately for each

tree is not tenable:n degree-d trees each with θ(n ?1/d

1?1/d )nodes have θ(dn 2)edges that must be monitored;such monitoring would require θ(dn 2)messages per node ev-ery p seconds (p =10in our system).To put this in per-spective,consider a n =512-node system with d =8-

attribute = f(A..H)

(a)Impact of leaf failure without early expiration

attribute = f(A..H)

(b)Impact of leaf failure with early expiration

Fig.4:Recalculation of aggregate function across values A,B,...,H after the node with input H fails (a)without and (b)with early expiration.

L1L2L3

Fig.6:Plaxton tree topology is an approximate butter?y net-work.The bold connections illustrate how a virtual node 00*uses the dual tree pre?x aggregation abstraction to aggregate values from a tree below it and distribute the results up a tree above it.

ary trees (i.e.,a DHT with 3-bit correction per hop).The straightforward algorithm then has each node sending over roughly 400pings per second.As the system grows,the situation deteriorates rapidly—a 4096-node system requires each node to send roughly 3200pings per sec-ond.

Our solution reduces active monitoring work to θ(d log n )pings per node per p seconds.The 512-node system in the example will require each node to send about 3pings per second;the 4096-node system will re-quire each node to send about 4pings per second.Dual tree pre?x aggregation.To make it practical to maintain the NI values,we take advantage of the under-lying structure of our Plaxton-tree-based DHT [27]to re-use common sub-calculations across different aggre-gation trees using a novel dual tree pre?x aggregation abstraction.

In particular,we note that as Figure 6illustrates,the Plaxton tree algorithm forms an approximate butter?y network.For a degree-d tree,the virtual node at level i has an id that matches the keys that it routes in log d ?i bits.It is the root of exactly one tree,and its children in that tree are approximately d virtual nodes that match keys in log d ?(i ?1)bits.It has d parents,each of which matches different subsets of keys in log d ?(i +1)bits.

But notice that for each of these parents,this tree aggre-gates inputs from the same subtrees.

Whereas the standard aggregation abstraction com-putes an aggregation function across a set of subtrees and propagates it to one parent,a dual tree pre?x aggregation computes an aggregation function across a set of subtrees and propagates it to all parents .As Figure 6illustrates,each node in a dual tree pre?x aggregation is the root of two trees:an aggregation tree below that computes an aggregation function across a set of leaves and a distribu-tion tree above that propagates the result of this compu-tation to a collection of enclosing aggregates that depend on this sub-tree for input.

For the N reachable count and N dup lease,the values propagated up are aggregates on the subtree (the num-ber of reachable nodes and the minimum lease duration granted by the subtree),so the same value can be propa-gated by a node to all of its parents.

For example in Figure 6,consider the level 2virtual node 00*mapped to node 000.This node’s N reachable count of 4represents the total number of leaves included in that virtual node’s subtree.This node aggregates this single N reachable count from its descendents and prop-agates this value to both of its level-3parents,000and 001.For simplicity,the ?gure shows a binary tree;by default PRISM corrects three bits per hop and d =8,so each subtree is common to 8parents.

5Experimental Evaluation

To evaluate PRISM,we perform experiments on two types of networks:(1)several LAN clusters (a 50-node departmental Condor cluster and 50to 85Emulab [39]nodes)and (2)94nodes on the PlanetLab distributed testbed [26].Our prototype has been developed using SDIMS [40]on top of FreePastry [29].

Our experiments characterize the performance and scalability of the AI,TI,and NI metrics for PrMon and distributed heavy hitters (DHH)applications.First,we

T o t a l # m e s s a g e s (n o r m a l i z e d )

AI (% of max)

Fig.7:Load vs.AI for TX1and CPU attributes with no TI ?ltering

T o t a l # m e s s a g e s (n o r m a l i z e d )

TI (seconds)

Fig.8:Load vs.TI for a single attribute with no AI ?ltering

use CoMon [6]data collected from PlanetLab [26]and net?ow traces from Abilene [1]to quantify the reduc-tion in monitoring overheads due to AI and TI.Second,we analyze the deviation in the PRISM’s reported val-ues with respect to both the ground truth based on sen-sor readings and the guarantees de?ned by AI and TI.Finally,we investigate the consistency/availability trade-offs that NI exposes.In summary,our experimental re-sults show that PRISM is an effective substrate for scal-able monitoring:introducing small amounts of AI and TI signi?cantly reduces monitoring load,and the NI metrics both successfully characterize system state and reduce measurement inaccuracy.

5.1Load vs.Imprecision

In this subsection we quantify the reduction in monitor-ing load due to AI and TI for both the PrMon and DHH applications.Further,we characterize the reduction in monitoring load due to AI and TI for different sensor data distributions by running large-scale simulations on synthetic datasets.5.1.1

PrMon

We begin by comparing the monitoring cost of PrMon distributed monitoring service to the centralized CoMon service,which uses a ?xed TI of 5minutes and which does not exploit AI.We gather CoTop [6]data from 200PlanetLab nodes at 1-second intervals for 1hour on 25September 2006.The CoTop data provides the per-slice resource usage (e.g.,CPU,MEM,TX1)for all slices run-ning on a given PlanetLab https://www.wendangku.net/doc/29310227.html,ing these logs as sen-sor input,we run PRISM on 200servers mapped to 50Emulab machines each having a 3GHz CPU and 2GB RAM.

Figure 7shows the AI precision-performance results for the PrMon application for two attributes (the total TX1and CPU usage of slice princeton codeen across 200PlanetLab nodes).The TX1attribute denotes the total number of bytes transmitted by a slice in the last

minute.The x-axis shows the global AI budget,and the y-axis shows the total message load normalized with re-spect to AI of -1(no AI caching)and TI =T I min =50ms.Each data point represents the total number of messages sent during the 1-hour run.From the ?gure,we observe that for CPU,the load falls by 68%when AI changes from -1(no caching)to 0and a 10%AI fur-ther provides almost a 40%reduction in load compared to AI=0.The load reduction from AI=-1to AI=0comes from culling new updates that exactly match the previ-ous report.However,if the CPU value changes,it gener-ally deviates by a large amount,resulting in limited gains achieved by 10%AI.For the TX1attribute,the sensor sends an update every 60seconds.In this case,changing AI from -1to 0provides roughly a 12%reduction in load whereas 10%AI reduces the load by 50%.

Figure 8shows the corresponding TI precision-performance results with no AI ?ltering.The initial TI value of T I min (50ms)corresponds to immediate prop-agation of messages along the aggregation tree.From the graph,we observe that the reduction in system load is 80%and over an order of magnitude for non-pipelined and pipelined 10second TI delays respectively compared to TI of T I min .

Figure 9shows the combined effect of AI and TI in reducing monitoring load for the CPU attribute for the princeton codeen slice.We use TI of 10seconds,30seconds,1minute,and 5minutes,and for each of these TI values,we run the experiment for AI values of -1,0,10%,and 20%.We observe that the load falls by 70%from AI of -1to AI of 10%for a given TI.Further,for a ?xed AI,the monitoring load shows a curve following 1/TI as in Figure 8.For this attribute,giving an AI of 10%or 20%only provides additional load reduction of 10%and 16%respectively due to low temporal locality.Next Figure 10shows the combined effect of AI and TI in reducing monitoring load for all the nine at-tributes (TX1,TX15,RX1,RX15,#PR,PMEMMB,VMEMMB,CPU%,and MEM%)emitted by the CoTop

0.01

0.1

T o t a l # m e s s a g e s (n o r m a l i z e d )

TI (seconds)

Fig.9:Load vs.AI and TI for CPU attribute 0.001

0.01

0.1

T o t a l # m e s s a g e s (n o r m a l i z e d )

TI (seconds)

Fig.10:Load vs.AI and TI for all attributes

sensor for all the running PlanetLab slices in our trace data.We observe that for AI of -1,there is more than one order of magnitude load reduction for TI of 5minutes compared to 10seconds.Likewise,for a ?xed TI of 10seconds,AI of 20%reduces load by two orders of mag-nitude compared to AI =-1.By combining AI of 20%and TI of 30seconds,we get both an order of magni-tude load reduction and an order of magnitude reduction in the time lag between updates compared to CoMon’s AI of -1and TI of 5minutes.Alternatively,for approx-imately the same bandwidth cost as CoMon with TI of 5minutes and AI of -1for 200nodes,PRISM provides highly time-responsive and accurate monitoring with TI of 10seconds and AI of 0.5.1.2

Detecting Heavy Hitters

For our heavy hitter case study,we use multiple net?ow traces obtained from Abilene [1]Internet2backbone net-work.The data was collected for 1hour on April 4,2006;each backbone router logged per-?ow data every 5min-utes,and we split this trace into 200buckets based on the hash of source IP.Our monitoring system executes a Top-10query on this dataset for tracking the top 10?ows (destination IP as key)in terms of bytes received over a 30second moving window shifted every 5seconds.Figure 11shows the precision-performance results for the top-10heavy hitter query for 50nodes on the depart-mental Condor testbed.The x-axis shows the AI budget and the y-axis shows the total monitoring load per unit time normalized relative to the load for AI =0.The AI budget is varied from 1%to 10%of the top ?ow’s global traf?c volume.From the graph,we observe that the knee of the graph at 10%AI provides over an order of mag-nitude reduction in monitoring load.A large fraction of the reduction comes from completely eliminating aggre-gation for “mouse”?ows whose total bandwidth is less than the imprecision budget at the leaves.

Figure 12shows the corresponding results for the pipelined and non-pipelined TI delays.We ?nd that us-ing pipelined delays,a 10seconds TI achieves an order of

magnitude reduction in monitoring load.Increasing the TI beyond 10seconds yields additional,albeit smaller,reductions.For non-pipelined delays,TI of 25seconds yields an order of magnitude load reduction.5.1.3

Generalized Model:Simulation study To generalize the trade-off between AI and monitoring cost,we evaluate the conditions under which AI is effec-tive i.e.,the distribution of the data values reported by the sensors.We ?rst investigate via simulation a large-scale aggregation network with 7776physical nodes or-ganized as a 5level aggregation tree with uniform degree 6.For each leaf node,we model the the data values of incoming traf?c using two distributions:a Gaussian dis-tribution and a uniform distribution.Our aim is to the evaluate the effect on load due to the noise in the data values given a ?xed AI budget.

Figure 13shows the corresponding results using our simulator over these two data distributions.The x-axis denotes the ratio of total noise induced by all the leaf sen-sors to the total AI budget.We observe that when noise is small compared to the AI budget,we ?lter almost all updates and load can be reduced by an order of magni-tude.But,as expected,when noise is large compared to the error budget,the load asymptotically approaches the load with AI =0.The uniform distribution allows almost perfect culling of updates for small amounts of noise whereas for the Gaussian distribution,there is a small yet a ?nite probability for data values to deviate arbitrarily from their previously reported range.

In summary,our evaluation shows that small AI and TI budgets can provide large bandwidth savings to enable scalable monitoring.

5.2Promised vs.Realized Accuracy

In this subsection we aim to answer the following ques-tion:do PRISM’s reported values re?ect reality?We quantify the difference between PRISM’s reports and the “ground truth”based on the instantaneous sensor read-ings.To evaluate this deviation,we compare PRISM’s

0.2 0.4 0.6 0.8 1

N o r m a l i z e d L o a d

Arithmetic Imprecision (%)

Fig.11:Normalized load vs.AI for the top-10query on Abilene

N o r m a l i z e d L o a d

Temporal Imprecision (s)

Fig.12:Normalized load vs.TI for the top-10query on Abilene

20 40 60 80 100C D F (%)

Difference (%)

Fig.14:Cumulative Distribution function (CDF)for dif-ference between PRISM’s reported values wrt.(a)oracle’s reports and (b)PRISM’s legal guarantees for ?xed TI of 1second.

20 40 60 80 100C D F (%)

Difference (%)

Fig.15:Cumulative Distribution function (CDF)for dif-ference between PRISM’s reported values wrt.(a)oracle’s reports and (b)PRISM’s legal guarantees for ?xed TI of 10seconds.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0.01

0.1 1 10 100

N o r m a l i z e d L o a d

Delta to noise budget ratio

Guassian Distribution Uniform Distribution

Fig.13:Normalized Load vs.noise of synthetic workload for a ?xed AI budget.If noise

reported results with (1)an oracle service that reports true aggregate values based on sensor readings at any time instant and (2)the legal guarantees promised by PRISM’s AI and TI metrics.

Figure 14and 15show the CDF of deviation between PRISM’s reported values for PrMon’s “CPU”attribute compared to both the “oracle”instantaneous values and

the legal guarantees de?ned by AI and TI.In Figure 14,we ?x TI to 1second and then report the CDF of differ-ence in the attribute’s reported values for different values of AI.We make two important observations here:(1)PRISM’s reported values lie within the envelope de?ned by AI and TI for essentially all reports and (2)for 5%AI and 1second TI,more than 90%of reports have differ-ence less than 15%from the oracle.As illustrated in Fig-ure 14,increasing the TI to 10seconds results in larger deviation between PRISM’s reported results and the ora-cle.For 5%AI and 10s TI,more than 90%reports differ by less than 27%from the oracle.The relatively large errors relative to AI are due to the low temporal locality of the CPU attribute:small TI adds signi?cant additional variation compared to the oracle.But,the values remain within the legal guarantees de?ned by the combined AI and TI limits.

5.3NI:Coping with Disruption

Finally,we analyze the effectiveness of our NI metrics in accurately re?ecting network state and ?ltering inaccu-rate reports.

We ?rst show how NI metrics re?ect network state for

5 10 15 20 25 0

200 400 600 800 1000 1200 1400 1600 1800

N e t w o r k I m p r e c i s i o n M e t r i c s

Time (seconds)

Nreachable

Nall Ndup

Fig.16:NI metrics under induced system churn –single node failure at 815seconds into the experimental run.

10 20 30 40 50 60 70 80 90 100 0 2 4 6

8 10 12 14 16 18

V a l u e

Time (hours)

Nreachable

Nall Ndup

Fig.17:NI metrics re?ecting PlanetLab state (85nodes).

a small scale controlled experiment.In Figure 16,we run a 20node experiment on the Condor cluster where we kill a single node at 815seconds into the run and observe the variation of reported NI metrics for an at-tribute with TI of 60seconds.This failure causes the N reachable value to fall from 20to 15within 40sec-onds after the node failure.The drop in N reachable indi-cates that any result calculated in this interval might only include correct values from 15nodes.The N all value remain stable from 20until about 1600seconds to re-?ect the long T declareDead timeout before the system de-clares unreachable nodes to be dead.Correspondingly,the N dup value goes from 0to 4at about 1060seconds when the disconnected subtree joins a new parent and starts reporting its N dup value to that parent.Finally,the N dup value falls back to 0about T declareDead time units (T declareDead =10minutes)after the dead event and both N all and N reachable stabilize to 19(nodes)denoting that the system is back to a stable state.

Figure 17shows how NI re?ects network state for a 85-node PlanetLab experiment for a 18-hour run start-ing 4October 2006.We observe that even without any induced failures,there are short-term instabilities in val-ues reported by N reachable ,N all ,and N dup due to miss-ing/delayed ping reply messages for N reachable and lease expirations triggered by DHT recon?gurations for N dup .During the course of the run,5of the 85nodes became unresponsive;hence the ?nal N reachable and N all values stabilize to 80.

Next we quantify the risks of reporting global aggre-gate results without incorporating NI.We run a 1hour experiment on 94PlanetLab nodes for an attribute with AI =0and TI =10seconds.Figure 18shows the CDF of reported answers showing the deviation in reports with respect to an oracle.The different lines in the graph correspond to the reported answers ?ltered for different NI thresholds.For simplicity,we condense NI to a sin-gle parameter MAX(N all ?N reachable

N all ,N dup N all

).We observe that NI effectively re?ects the stability of network state:

when NI <5%,80%answers have less than 20%devi-ation from the true value and when NI <90%,80%an-swers can deviate by as high as 65%from the true value.Note that for monitoring systems that ignore NI (no ?l-tering line),90%of their reports can differ by 80%from the truth.

In Figure 19we explore the effectiveness of a general strategy to achieve high consistency in reported aggre-gate values during periods of churn.We use K redundant trees in the DHT to compute an attribute and then use NI to identify the highest-quality result.Figure 19shows the CDF of results with respect to the deviation from oracle as we vary K from 1to 4.We observe that when devia-tion is less than 10%(small NI),retrieving results from the root of one aggregation tree (K =1)suf?ces.How-ever,for large deviation,fetching the reports from only one aggregation tree can introduce deviation as high as 100%whereas choosing the result from the most stable of 4trees reduces the deviation to at most 22%thereby reducing the worst-case inaccuracy by nearly a factor of 5.

Filtering answers during periods of high churn ex-poses a fundamental consistency versus availability tradeoff [13].Figure 20shows how varying K allows us to increase monitoring load to improve this tradeoff.As K increases,the fraction of time during which NI is low increases.The intuition behind the approach is that since the vast majority of nodes in any 8-ary tree are near the leaves,sampling several trees rapidly increases the prob-ability that at least one tree avoids encountering many near-root failures.We provide an analytic model formal-izing this intuition in the appendix.

6Related Work

The three imprecision metrics in our work are inspired by and relate to a number of research traditions in the distributed systems community.

The AI and TI metrics are related to several research efforts allow applications to trade precision for commu-

C D F (% a n s w e r s )

Difference from truth (%)

Fig.18:Cumulative Distribution function (CDF)for reported answers ?ltered for different NI thresholds and K =1.

C D F (% a n s w e r s )

Difference from truth (%)

Fig.19:Cumulative Distribution function (CDF)of NI values for different K.C D F (% a n s w e r s )

Fig.20:Cumulative Distribution function (CDF)of NI values for K duplicate keys.

nication overhead.Olston et al.[3,22]propose adaptive ?lters at the data sources that compute approximate an-swers for continuous queries.Their work,however,fo-cuses on single-level communication topologies.In a hi-erarchical communication setting,Manjhi et al.[21]con-sider the problem of ?nding frequent items in database streams;they focus on determining an optimal but static distribution of slack to the internal and leaf nodes of the tree.TAG [20],an aggregation service for sensor net-works,employs a similar approach as PRISM for bound-ing TI when nodes are approximately synchronized.Consistency has long been studied in the context of non-aggregating ?le systems and databases.Yu et al.[45]propose three metrics—Numerical Error,Order Error,and Staleness—to capture the consistency spec-trum in a distributed replicated system where any node can perform read or write operations.Numerical error is similar to AI and Staleness is similar to TI.Similarly,?le systems providing cache consistency often provide leases on individual objects [14]or volume leases on groups of objects [43]

Consistency for aggregation differs in two fundamen-tal ways.First,aggregation systems are large-scale with many concurrent writers which implies that it is not fea-sible to resolve CAP dilemma [13]by blocking reads during periods when a writer may be disconnected.So we emphasize availability by providing conditional con-sistency:operations always complete but results are an-notated with information about their quality.Second,in hierarchical aggregation that accumulates inputs from many sensors,ampli?cation effect of failures can make results substantially deviate from the real values.

The idea of ?agging results when the state of a dis-tributed system is disrupted by node or network failures has been used in tackling other distributed systems prob-lems.For example,our idea of conditioned consistency is similar in spirit to the notion of failure detectors [4]for fault-tolerant distributed systems.Also,in quorum systems,Pierce and Alvisi’s pseudo-regular and pseudo-atomic semantics provide regular and atomic semantics on all operations,but they allow operations to abort if concurrency or network failures would prevent such a

guarantee [25].Kostoulas et al.[19]point out the im-possibility of group size estimation in a dynamic group and propose an active gossip-based scheme and a pas-sive approach based on interval densities when nodes are hashed onto a given real interval.Freedman et al.pro-pose link-attestation groups abstraction in [11]that uses an application speci?c notion of reliability and correct-ness,so as to map which pairs of nodes consider each other reliable.Their system,designed for groups on the scale of tens of nodes,monitors the nodes and system and exposes such attestation graph to the applications.Traditionally,DHT-based aggregation is event-driven and best-effort,i.e.,each update event triggers re-aggregation for affected portions of the aggregation tree.Further,systems often only provide eventual consistency guarantees on its data [36,40],i.e.,updates by a live node will eventually be visible to probes by connected nodes.There are ongoing efforts similar to ours in the P2P and databases community to build global monitoring ser-vices.PIER is a DHT-based relational query engine [16]targeted at querying real-time data from many vantage-points on the Internet.Sophia [38]is a distributed moni-toring system designed with a declarative logic program-ming model.A recent study [17]has proposed support of aggregate triggers in monitoring systems in which individual nodes can independently detect and react to changes in the global system-wide behavior.PRISM may enhance such efforts by providing a scalable way to track top-k and other signi?cant events.

7Conclusions

Without precision guarantees,large scale network mon-itoring systems may be too expensive to implement (be-cause too many events ?ow through the system)or too dangerous to use (because data output by such systems may be arbitrarily wrong.)PRISM provides arithmetic imprecision to bound numerical accuracy,temporal im-precision to bound staleness,and network imprecision to expose cases when ?rst two bounds can not be trusted.

References

[1]Abilene internet2network.https://www.wendangku.net/doc/29310227.html,/.[2] D.Andersen,H.Balakrishnan,M.Kaashoek,and R.Morris.Re-

silient overlay networks.In Proc.SOSP,pages131–145.ACM Press,2001.

[3] B.Babcock and C.Olston.Distributed top-k monitoring.

In ACM SIGMODInternational Conference on Management of Data,pages28–39,June2003.

[4]T.Chandra and S.Toueg.Unreliable failure detectors for reliable

distributed systems.J.ACM,43(2):225–267,Mar.1996.

[5] D.D.Clark,C.Partridge,J.C.Ramming,and J.Wroclawski.A

knowledge plane for the internet.In A.Feldmann,M.Zitterbart, J.Crowcroft,and D.Wetherall,editors,SIGCOMM,pages3–10.

ACM,2003.

[6]https://www.wendangku.net/doc/29310227.html,/.

[7]R.Cox,A.Muthitacharoen,and R.T.Morris.Serving DNS using

a Peer-to-Peer Lookup Service.In IPTPS,2002.

[8]M.Dahlin,B.Chandra,L.Gao,and A.Nayate.End-to-end

wan service availability.IEEE/ACM Transactions on Network-ing,2003.

[9] C.Estan and G.Varghese.New directions in traf?c measurement

and accounting.In SIGCOMM,pages323–336.ACM,2002. [10]M.J.Freedman and D.Mazires.Sloppy Hashing and Self-

Organizing Clusters.In2nd Intl.Workshop on Peer-to-Peer Sys-tems,Berkeley,CA,February2003.

[11]M.J.Freedman,I.Stoica,D.Mazieres,and S.Shenker.Group

therapy for systems:Using link attestations to manage failures.

In IPTPS,2006.

[12]Y.Fu,J.Chase,B.Chun,S.Schwab,and A.Vahdat.SHARP:

An architecture for secure resource peering.In Proc.SOSP,Oct.

2003.

[13]S.Gilbert and N.Lynch.Brewer’s conjecture and the feasibility

of Consistent,Available,Partition-tolerant web services.In ACM SIGACT News,33(2),Jun2002.

[14] C.Gray and D.Cheriton.Leases:An Ef?cient Fault-Tolerant

Mechanism for Distributed File Cache Consistency.In SOSP, pages202–210,1989.

[15]J.M.Hellerstein,V.Paxson,L.L.Peterson,T.Roscoe,

S.Shenker,and D.Wetherall.The network oracle.IEEE Data Eng.Bull.,28(1):3–10,2005.

[16]R.Huebsch,J.M.Hellerstein,https://www.wendangku.net/doc/29310227.html,nham,B.T.Loo,S.Shenker,

and I.Stoica.Querying the Internet with PIER.In Proceedings of the VLDB Conference,May2003.

[17] A.Jain,J.M.Hellerstein,S.Ratnasamy,and D.Wetherall.A

wakeup call for internet monitoring systems:The case for dis-tributed triggers.In Proc.3rd ACM SIGCOMM Workshop on Hot Topics in Networks(HotNets),San Diego,CA,November2004.

[18]N.Jain,D.Kit,P.Mahajan,P.Yalagandula,M.Dahlin,and

Y.Zhang.PRISM:precision-aware aggregation for scalable mon-itoring(extended).Technical Report TR-06-22,UT Austin De-partment of Computer Sciences,May2006.

[19] D.Kostoulas,D.Psaltoulis,I.Gupta,K.Birman,and A.Demers.

Decentralized schemes for size estimation in large and dynamic groups.In IEEE Network Computing and Applications(NCA05), 2005.

[20]S.R.Madden,M.J.Franklin,J.M.Hellerstein,and W.Hong.

TAG:a Tiny AGgregation Service for Ad-Hoc Sensor Networks.

In OSDI,2002.

[21] A.Manjhi,V.Shkapenyuk,K.Dhamdhere,and C.Olston.Find-

ing(Recently)Frequent Items in Distributed Data Streams.In ICDE,pages767–778.IEEE Computer Society,2005.

[22] C.Olston,J.Jiang,and J.Widom.Adaptive?lters for continuous

queries over distributed data streams.In SIGMOD,SIGMOD 2003.

[23] C.Olston and J.Widom.Offering a precision-performance trade-

off for aggregation queries over replicated data.In VLDB,pages 144–155,Sept.2000.

[24]V.Paxson.End-to-end Routing Behavior in the Internet.In SIG-

COMM,Aug.1996.

[25]L.Pierce and L.Alvisi.A framework for semantic reasoning

about byzantine quorum systems.In Brief Announcements,Proc.

of Symp.on Principles of Distributed Computing,2001.

[26]Planetlab.https://www.wendangku.net/doc/29310227.html,.

[27] C.G.Plaxton,R.Rajaraman,and A.W.Richa.Accessing Nearby

Copies of Replicated Objects in a Distributed Environment.In

ACM SPAA,1997.

[28]S.Ratnasamy,P.Francis,M.Handley,R.Karp,and S.Shenker.A

Scalable Content Addressable Network.In Proceedings of ACM

SIGCOMM,2001.

[29] A.Rowstron and P.Druschel.Pastry:Scalable,Distributed Ob-

ject Location and Routing for Large-scale Peer-to-peer Systems.

In Middleware,2001.

[30]J.Shneidman,P.Pietzuch,J.Ledlie,M.Roussopoulos,

M.Seltzer,and M.Welsh.Hourglass:An infrastructure for con-

necting sensor networks and applications.Technical Report TR-

21-04,Harvard Technical Report,2004.

[31] A.Siegel.Performance in Flexible Distributed File Systems.PhD

thesis,Cornell,1992.

[32] A.Singla,U.Ramachandran,and J.Hodgins.Temporal notions

of synchronization and consistency in Beehive.In Proc.SPAA,

1997.

[33]I.Stoica,R.Morris,D.Karger,F.Kaashoek,and H.Balakrish-

nan.Chord:A scalable Peer-To-Peer lookup service for internet

applications.In ACM SIGCOMM,2001.

[34]R.Tewari,M.Dahlin,H.Vin,and J.Kay.Design Considerations

for Distributed Caching on the Internet.In ICDCS,May1999.

[35]https://www.wendangku.net/doc/29310227.html,/.

[36]R.van Renesse,K.Birman,and W.V ogels.Astrolabe:A ro-

bust and scalable technology for distributed system monitoring,

management,and data mining.TOCS,21(2):164–206,2003.

[37] D.Veitch,S.Babu,and A.Pasztor.Robust synchronization of

software clocks across the internet.In IMC’04:Proceedings

of the4th ACM SIGCOMM conference on Internet measurement,

pages219–232,New York,NY,USA,2004.ACM Press.

[38]M.Wawrzoniak,L.Peterson,and T.Roscoe.Sophia:An Infor-

mation Plane for Networked Systems.In HotNets-II,2003.

[39] B.White,J.Lepreau,L.Stoller,R.Ricci,S.Guruprasad,

M.Newbold,M.Hibler,C.Barb,and A.Joglekar.An integrated

experimental environment for distributed systems and networks.

In Proc.OSDI,pages255–270,Boston,MA,Dec.2002.

[40]P.Yalagandula and M.Dahlin.A scalable distributed information

management system.In Proc SIGCOMM,Aug.2004.

[41]P.Yalagandula,P.Sharma,S.Banerjee,S.-J.Lee,and S.Basu.

S3:A Scalable Sensing Service for Monitoring Large Networked

Systems.In Proceedings of the SIGCOMM Workshop on Internet

Network Management,2006.

[42]J.Yin,L.Alvisi,M.Dahlin,and C.Lin.Hierarchical Cache

Consistency in a WAN.In Proc USITS,Oct.1999.

[43]J.Yin,L.Alvisi,M.Dahlin,and C.Lin.V olume Leases to Sup-

port Consistency in Large-Scale Systems.IEEE Transactions on

Knowledge and Data Engineering,Feb.1999.

[44]H.Yu and A.Vahdat.Design and evaluation of a continuous

consistency model for replicated services.In OSDI,pages305–

318,2000.

[45]H.Yu and A.Vahdat.Design and evaluation of a conit-based con-

tinuous consistency model for replicated services.ACM Trans.on

Computer Systems,20(3),Aug.2002.

[46] B.Y.Zhao,J.D.Kubiatowicz,and A.D.Joseph.Tapestry:An

Infrastructure for Fault-tolerant Wide-area Location and Routing.

Technical Report UCB/CSD-01-1141,UC Berkeley,Apr.2001. 15

8Appendix

8.1

Arithmetic Imprecision

Mechanism We ?rst describe in detail the aggregation mechanism for a single ?ow in an aggregation tree for the SUM function with a given AI budget.8.1.1

Computing SUM for a single attribute To enforce AI,each aggregation subtree T for an at-tribute has an error budget δT which de?nes the maxi-mum inaccuracy of any result the subtree will report to its parent for that attribute.

Each node n in the aggregation tree maintains per-attribute state Ψn : δself ,V min ,V max ,L self ,?c δc ,V c min ,V c

max ,L c

Whenever n receives an update from a child c ,it trig-gers the aggregation function that re-computes the aggre-gate value of all latest received updates from its children.

Function:OnChildUpdate (child c ,range [V c min ,V c

max ],load L c )

Step https://www.wendangku.net/doc/29310227.html,pute synopses received from children set child(n):

P max =

c ∈chil

d (n )

V c max P min =

c ∈chil

d (n )

V c min

(4)

If n has never received an update for this attribute from

a child c ,then [V c min ,V c

max ]is set to [0,δc ].

Step 2.Pass new numeric range through local AI ?lter:if (P min V max ){

V min =P min ?bias ?δself ;//bias ∈[0,1];V max =V min +δT ;L self ++;

L = c ∈child (n )L c +L self ;Send (attr,V max ,L )to parent;}

For redistributing the AI budgets in our self-tuning algo-rithm,M self (M c )are set to the ratio of L self (L c for child c respectively)to the time elapsed since the last AI error distribution.

Leaf node:A leaf node can be viewed as an internal node with a single virtual child (the sensor itself)with AI =0i.e.,the sensor s triggers an update [V s ,V s ]to

the leaf node i.e.,V s =V s max =V s

min .Note that the messaging cost of transmitting between the virtual child (sensor)and the leaf node L s =0since they reside on the same physical node.8.1.2

Computing MIN for a single attribute The mechanism of computing the MIN aggregation func-tion is similar to the SUM where we replace the SUM

compuation in Equation 4by:

P max = min c ∈child (n )V c

max

P min = min c ∈child (n )

V c

min

(5)

8.1.3Computing MAX for a single attribute

The MAX aggregation function is symmetric to MIN.

Thus,Equation 5becomes:

P max =

max c ∈child (n )V c

max

P min = max c ∈child (n )

V c

min

(6)

8.1.4Computing A VG for a single attribute

The A VG aggregation function can be easily computed

as a (SUM,COUNT)pair along the same aggregation tree.

8.2

Optimality of Self-tuning AI Error Dis-tribution

The optimal distribution of δT among δself and δc is computed as follows:We ?rst ?nd the optimal AI error distribution for a simple degree-2tree having two lev-els with the root at level 1and its two children as the leaf https://www.wendangku.net/doc/29310227.html,ter,we will show how this topology can be modeled for any arbitrary d and for any level of the aggregation hierarchy.

Given this topology,we have δT =δc 1+δc 2;δself for the root node is set to 0since it doesn’t need to transmit any updates up in the hierarchy.As discussed in Sec-tion 3.1,under the assumption that load is inversely pro-portional to the error budget,we get:

M c 1M opt c 1=δc 1δopt c 1M c 2M opt c 2=δc 2

δopt

c 2

We formulate minimizing total load as a multivariate

optimization problem sibject to the constraint that the to-tal error budget is ?xed i.e.,

Minimize

f :M opt c 1+M opt

c 2

subject to the constraint:

g :δc 1+δc 2?δT =δopt c 1+δopt

c 2?δT =0

We use Lagrangian multipliers to ?nd the extremum of

f(δopt c 1,δopt c 2)subject to the constraint that g(δopt c 1,δopt c 2)=

0i.e.,

?f ?δopt c 1+λ?g

?δopt c 1

=0?

??δopt c 1 M c 1δc 1δopt c 1+M c 2δc 2δT ?δopt c 1 +λ?g

?δopt c 1=0??M c 1δc 1 δopt c 1

+M c 2δc 2

δT ?δopt c 1 2=0?√M c 1δc 1√M c 2δc 2=δopt c 1δT ?δopt c 1

?δopt

c 1=δT √M c 1δc 1√

M c 1δc 1+√M c 2δc 2

which is a special case of Equation 3when d =2.In the general case for a degree-d tree,we get Equation 3:

δopt

v =δT ?√

M v ?δv v ∈{self }∪child (n )√M v ?δv .Notes.For an internal-node,we model δself ,M self for

that node as a virtual child with δc =δself ,M c =M self and use the above equation for d +1children.

8.3Temporal Imprecision

Pipelined Delays Note that the pipelined delays mech-anism signi?cantly reduces the number of updates in an aggregation tree;at each level,an aggregate update is propagated at most once per I seconds where I =T I ? ?S .

To provide the temporal guarantees under synchro-nized clocks,S must be smaller than T .

This im-plies that temporal imprecision would be violated if

2?skew max >=(T ?hop max )>=1

(T ? ?hop max ).The intuition of this result is that (T ? ?hop max )is the total slack for the entire tree;therefore,skew max must be smaller than slack available per level 1 (T ? ?hop max ).

Proof of Correctness of Pipelined Delays.

Lemma 1.An event sent by level i at local time T i is sent by level i +1no later than local time (at level i )T i +S Proof.At j th step (TI interval),the extra delay intro-duced by node at level i +1

=(Z +j ?I +(i +1)?S

)-(Z +j ?I +i ?S )=S Lemma 2.A leaf event at time X sent by level 0no later than X +I [True by de?nition of I]

Theorem.An event at a leaf node at local time X is re?ected at root no later than time X +T I according to the local time at the same leaf node.

Proof.An event reaches level d no later than

X +I +d ?S [Combining Lemma 1and Lemma 2]=X +(T

I ?d ?S )+d ?S =X +T I

In general,if skew max is large due to unsynchronized clocks or weak synchronization we simply fall back on non-pipelined version which we describe next.Non-pipelined delays.For the case of unsynchronized clocks,we use the same algorithm as the pipelined case with the difference that 2?skew max is no longer used and the parameters Z,I ,and S are set slightly differently to re?ect lack of coordination between levels.Specif-ically,S =hop max (we ignore skew max )and cor-respondingly T = ?(I +S ).Note that we get a different bound on minimum temporal imprecision as T min = ?hop max for the non-pipelined case.

In terms of implementation,instead of a global refer-ence time as in the synchronized case,the aggregation function speci?es an arbitrary reference time Z .There-fore,at local time Z +i ?I corresponding to a node at any level of the aggregation tree,it sends an updated ag-gregate value to its parent iff the value of any of its inputs has changed since time Z +(i ?1)?I .The ef?ciency for non-pipelined case is qualitatively same as the pipelined case—at each level,an aggregate update is propagated at most once per I seconds.

Proof of Correctness of Non-Pipelined Delays.To

prove correctness,we de?ne the following lemma:

Lemma 3.At any level,the maximum delay in update propagation by a node is I +S .

This leads to the proof of Theorem 8.3for the non-pipelined case.

Proof.An event reaches level d no

later than X +d ?(I +S )=X +TI

Comparison.Given the same a priori temporal impre-cision budget,the value of I for the pipelined and the non-pipelined cases would be different i.e.,

I pipelined

=T ? ?S =

? T ?hop max ?2?skew max

I non pipelined

?hop max Therefore,when skew is small,i.e.2?skew max T

?hop max ,pipelined delay can achieve almost a factor of reduction in the update frequency.

8.4Network Imprecision

Here we present the analytical results for computing the expected NI using k aggregation trees after f indepen-dent failures have occured.Note that by”f independent failures”,we allow two failures to be on the same node; in this case,their contribution to NI is counted twice. Notation.

?c:the number of logical children a node has in the aggregation tree(i.e.,logical fanout).

?d:the depth of the aggregation tree

?P(i,f,k):with k random trees,the probability for at least one tree to have all failures occuring at level <=i(leaf at level0)(which implies the NI=N all?N reachable<=f?c i with f independent failures be-cause each node at level i contribute at most c i to NI).?E(NI,f,i):the expected NI with f independent failures conditioned on the fact every failure appears in max level of i or below

?Var(NI,f,i):the variance of NI with f independent fail-ures conditioned on the fact every failure appears in max level of i or below.

8.4.1Tail Probability Analysis

The probability for a failure to appear in max level?=i (leaf is level0)i.e.,Pr(failure in max level<=i)=1-Pr(failure in max level>i)

=(1?1/c i+1).

The probability for all f independent failures to appear in level<=i is therefore(1?1/c i+1)f.Note that the contribution to NI by each failure with max level<=i is at most c i.

Therefore we get P(i,f,1)=(1?1/c i+1)f.

With k random aggregation trees,the probability for at least one tree to have NI?=f?c i is

P(i,f,k)=1?(1?P(i,f,1))k=1?[1?(1?1/c i+1)f]k

(7) Example.Suppose c=8,f=10,k=4,then

a.with pro

b.>=99.95%N all?N reachable<=f?c1

=80

b.with prob.>=70.51%N all?N reachable<=f?c0

=10

Analysis on the Expected NI.We will use the above analysis to prove that with high probability,at least one aggregation tree has every failure appearing at max level <=i.

We?rst analyze the mean and standard deviation for NI conditioned on the fact that all failures occur at max level<=i.For a single failure,the probability for the failure to occur at max level j(conditioned on the fact its max level is<=i)is

Q(j)=

1/c j?1/c j+1

1?1/c i+1

Its contribution to NI is exactly c j.Therefore,with1 failure,we have

E(NI,1,i)=

j=0 (i)

Q(j)?c j

=(i+1)?

(1?1/c)

(1?1/c i+1)

V ar(NI,1,i)=E(NI2)?E(NI)2

j=0 (i)

Q(j)?c2?j?E(NI)2

=c i?E(NI)2

With f independent failures,we get:E(NI,f,i)<=f?E(NI,1,i)and Var(NI,f,i)<=f2*var(NI,1,i)

Note that we use”<=”instead of”=”because we ignore the fact that one failure may be the ancestor of another.

Example.Suppose c=8,f=10,k=4,i=1,then with prob.>=99.95%N all?N reachable<=f?c1=80and in this case,E(NI,f,i)<=17.78and STDDEV.(NI,f,i) <=22.

To summarize,a fundamental and unique challenge in-herent in any hierarchical aggregation system(regardless of whether DHT is used)is the failure ampli?cation ef-fect–once a non-leaf node fails,an entire subtree rooted at that node is affected.With d levels,with f failures, although the expected number of affected nodes is only (d+1)?f,the standard deviation is really high=f?c d/2. As a result,with fairly high probability,f failures can be ampli?ed to a much larger number of affected nodes even when f is very small.

The metrics we proposed in Section4

a.quantify the ampli?cation effect for each aggregation

tree.In particular,since our dual-tree aggregation scheme allows us to monitor the NI for every tree, by taking the mean of all NIs,we can accurately es-timate(d+1)?f because the standard deviation is now f?c d/2/c d=f/c d/2.If we normalize the NI for an aggregation tree by(d+1)?f,we then get the ampli?cation factor for that tree.

b.give us a way of reducing ampli?cation effect by using

multiple trees for the same aggregation function.

Graphad制作条形图和直方图

打开Graphpad—选择column（列）—选择enter replicate values,stacked into column—点击create—将常用图表中直方图内上海降水量复制粘贴进软件中—点击analyze—选择column analyze中的frequency distribution（频数分布）—右侧只选择降水量—点击OK—什么都不用变点击OK结束。某些单词的含义：confidence interval（置信区间，可靠区间）；inference（推论）calculation （计算，运算）cumulative。。。累计频数分布；第二个是个数，比例，百分比的意思。而频数分布图中间应该是没有间隙的，所以操作如下：双击柱子—点击graph setting—spacing 第一个改为0%。然后对柱子的颜色边框的宽度进行修改。如图：最后制作图片如图所示：

打开单式条图数据（此数据包括对照组、处理1、处理2、处理3）—选择column（列）—选择enter replicate values,stacked into column（选择未经过处理的源数据，制作最简单的条图）—将数据复制到软件，将data1名字改为单式条图查看即可。非常简单方便。有时候汉字会是乱码故可将其选中改为宋体即可。误差线也可进行调整，在appearence中选中error bar更改线条即可。（2）数据中包括平均值和标准差选择column（列）—选择enter and plot error values already calcuated elsewhere 选择mean&SD —点create—观察表格发现数据需要转置后方可复制。即先进行复制粘贴成数值然后转置。—复制到软件—将data3改为计算好均值标准差（3）复式条图（两个因素）打开复式条图使用一维表—选择插入—数据透视表—处理放入行标签，细胞放入列标签，个数放入数值，将数值的字段设置为平均值，在拖入一次个数将其设置为标准偏差—将得到的复制粘贴到新的工作表中，删掉名称、总计。将剩下的复制到软件就生成了图表。在grople 选择mean&SD中制作图表。图表下方的字是斜的可以通过双击改为horriaized（水平）（将重要的放在y轴，次重要的放在x轴）讲完后讲解一下改变图的类型

基于理论框架的实践分析

基于理论框架的实践分析在实践过程中，我们发现目前地方治理能力的评价存在一些问题：一是指标体系设计不够科学合理，缺少公众满意度等相关指标，指标体系设计没有突显对结果负责的管理主义视角的责任机制。而且各地政府所采用的评价指标体系，未能结合当地实际情况，未能充分体现各地区经济社会发展的特点和治理的实际需要，导致评价结果缺乏实用性。二是对评价结果的运用不够重视，使得治理能力评价工作对治理效果提升的实际作用不大。三是地方治理能力评价的相关法律法规不够健全，以至于各地的评价体系的设计和评价行为的实施无法可依，无章可循，使得评价工作随意盲目，缺乏规范性。四是评价工作主要是由治理主体主导，也就是由政府主导，社会各方参与地方治理能力评价的途径不畅，以至于评价结果缺乏可信性，削弱各方参与评价工作的积极性。党的十八大提出“治理”、“治理能力”、“社区治理”等概念，十八届三中全会又将全面深化改革的总目标确定为完善和发展中国特色社会主义制度，推进国家治理体系和治理能力的现代化。这表明治理理论与治理体系不仅是理论研究的重要内容，更是我国社会发展的迫切需求和时代发展的必然趋势。地方治理能力，作为治理体系的重要组成部分，其重要性日益显现[1]。结合大连的实际情况，大连市位于辽东半岛最南端,是全国15个副省级城市之一、全国5个国家社会与经济发展计划单列市之一,是辽宁沿海经济带的金融中心，航运物流中心，东北地区最大的港口城市。大连市的综合竞争力在全国也是名列前茅。大连地辖市内四区，市外三个区、三个县级市和一个海岛县,大连市内四区包括39个街道办事处，市外三区和四个县级市包括47个街道办事处,全市共有人口 669万[2]。如何整合大连市多层治理体系，形成合力，实现大连市经济社会的跨越式发展成为摆在大连市面前的重要问题。通过实践过程，我们发现近年来大连市社会治理工作取得了优异成绩，大连市委、市政府始终把创新社会治理工作摆在重要位置，积极探索符合大连实际情况的社会治理体制机制，广大人民群众的满意度和幸福感持续提升。西岗区365工作体系就成为全国社会治理改革创新的样本，在全国范围内推广和学习。但通过对大连市治理能力的评价过程，会发现大连市社会治理还存在差距和不足，需要加以改进。首先，地方治理机制不健全。居民参与力度、深度、广度不够。公众没有广泛地参与到地方治理中，地方治理体系还没有从实质上将公众的行为纳入进来，公民在某种程度上游离于地方治理体系之外。第二，地方治理理念陈旧。居民作为社会中的一员，没有意识到自己是地方治理能力建设与发展的主体,没有意识到自己对社会应尽的责任与义务,甚至认为社会建设与发展是政府的事。而政府仍沿袭之前的治理模式，没有意识到地方治理能力创新的重要性。第三，有关地方治理能力与治理体系的法制不健全。我国关于地方治理或相关的法律法规并不多，现存的法律法规涉及范围较小也不够完善，已经无法适应经济社会发展的需要和地方治理能力的创新。第四，地方治理体系中第三方参与力量相对薄弱。社会自治组织和中介组织比较弱小，社会资源难以有效整合。社会中介组织缺乏足够的资源和权威，没有足够的能力弥补政府部门在社会治理中的局限。然而社会中介组织的发展，对于较好地满足居民需要,充分发掘和利用社会资源,建立健全地方治理体系等方面具有重要意义[3]。最后，地方政府是公共产品和公共服务的主要提供者。随着我国经济社会的发展，人们生活水平的提高，社会公众对于政府提供公共产品和公共服务的能力提出了更高的要求[4]。但是地方政府往往受财力和管理能力的限制，无法满足社会公众日益增长的物质文化需要，其提供公共

奥斯特罗姆制度分析与发展框架评介

#国外经济理论专题# 奥斯特罗姆制度分析与发展框架评介* 王群内容提要:埃莉诺#奥斯特罗姆的公共池塘资源自主治理理论已经获得学术界的认可,并在实践中日益广泛地应用。那么如何实现公共池塘资源自主治理理论提出的八项原则呢?制度分析与发展框架综合前人的研究成果,为自主治理体制的构建、调整与改善提供了相应的指导。在整个框架中,行动情境最为复杂和重要,致力于解释参与资源治理过程的个体在既有信息和控制力下基于一定身份所采取的行动对治理结果的影响。应用规则约束人们在行动情境中的行为,其改变不仅导致行动情境内部结构的变化,还能使整个制度框架结构发生变化。制度分析与发展框架在奥斯特罗姆近30多年的不懈努力下,逐步趋向完善。本文对制度分析与发展框架的最新发展进行了补充性介绍,以增进国内学术界对此理论成果的了解。关键词:埃莉诺#奥斯特罗姆制度分析与发展框架行动情境应用规则埃莉诺#奥斯特罗姆由于/在经济治理、尤其是在公共资源治理方面0的卓越贡献,被瑞典皇家科学院授予2009年度诺贝尔经济学奖。她颠覆了公共财产只有交由中央权威机构管理或完全私有化后才能有效管理的传统观念,证明使用者自主治理的公共池塘资源可以通过合理的制度安排取得优于人们先前根据标准理论所预测的结果(Ostrom,1990)。与公共池塘资源自主治理理论息息相关的制度分析与发展框架(Institutional Analysis and Develo p-m ent framew ork,简称为IAD fram ew o rk)从1982年起就一直是奥斯特罗姆的研究重点之一(Kiser &Ostro m,1982)。它致力于解释包括应用规则在内的外生变量(exog enous variable)如何影响公共池塘资源自主治理中的政策结果,为资源使用者提供一套能够增强信任与合作的制度设计方案及标准(Poteete et al.,2010),并且用来评估、改善现行的制度安排。经过30多年的不断完善,制度分析与发展框架如今已成为公共池塘资源自主治理的/操作指南0。本文从理论和实践两方面介绍制度分析与发展框架的最新发展。一、理论基础奥斯特罗姆认为,制度分析与发展框架有着多重渊源,它在古典政治经济学、新古典微观经济学、公共选择理论、交易成本经济学、非合作博弈论中都可以找到理论依据,是综合了多种学科的一组分析框架(Ostr om et al.,1994)。它既可以用来研究静态制度安排,也可以用来研究新规则和新技术不断出现的动态制度安排; 它将人们的经济行为分解成图1制度分析与发展框架若干相互关联、牵制的组成部分,使我们既可以对具体问题进行细致分析,又可以将各种问题联系起来综合考虑。奥斯特罗姆通过制度分析与发展框架向我们表明,对资源退化等问题的研究不应该仅限于相关的自然属性,例如土壤、动植物种类、降水;资源所在社区(comm unity,或译为共同体)的特点、管理体系、产权、用以规范个体之间关系的应用规则等社会因素和自然属性一样重要。一个完整的制度分析与发展框架包括7组主要变量(图1)。 ) 137 ) 5经济学动态62010年第4期 *此文系国家自然科学基金项目(70973064)成果。

挖掘—代谢组常用分析图表及作图软件

PPT 模板下载：https://www.wendangku.net/doc/29310227.html,/moban/ 行业PPT 模板：https://www.wendangku.net/doc/29310227.html,/hangye/ 节日PPT 模板：https://www.wendangku.net/doc/29310227.html,/jieri/ PPT 素材下载：https://www.wendangku.net/doc/29310227.html,/sucai/ PPT 背景图片：https://www.wendangku.net/doc/29310227.html,/beijing/ PPT 图表下载：https://www.wendangku.net/doc/29310227.html,/tubiao/ 优秀PPT 下载：https://www.wendangku.net/doc/29310227.html,/xiazai/ PPT 教程： https://www.wendangku.net/doc/29310227.html,/powerpoint/ Word 教程： https://www.wendangku.net/doc/29310227.html,/word/ Excel 教程：https://www.wendangku.net/doc/29310227.html,/excel/ 代谢组研究常用的分析图表介绍经典作图软件图表解读中可能遇到的问题代谢组数据的建议

PPT 模板下载：https://www.wendangku.net/doc/29310227.html,/moban/ 行业 PPT 模板：https://www.wendangku.net/doc/29310227.html,/hangye/ 节日PPT 模板：https://www.wendangku.net/doc/29310227.html,/jieri/ PPT 素材下载：https://www.wendangku.net/doc/29310227.html,/sucai/ PPT 背景图片：https://www.wendangku.net/doc/29310227.html,/beijing/ PPT 图表下载：https://www.wendangku.net/doc/29310227.html,/tubiao/ 优秀PPT 下载：https://www.wendangku.net/doc/29310227.html,/xiazai/ PPT 教程： https://www.wendangku.net/doc/29310227.html,/powerpoint/ Word 教程： https://www.wendangku.net/doc/29310227.html,/word/ Excel 教程：https://www.wendangku.net/doc/29310227.html,/excel/ l PCA l 差异代谢物筛选l 差异代谢物分析l 通路富集分析l ROC曲线l 生存曲线l 相关性网络图l 特定代谢通路图 01 代谢组研究常用的分析图表介绍

土地利用规划的制度分析框架与理论命题_陈丽

改革开放以来，作为基本生产要素的土地资源对我国经济增长、城市建设及其它各项建设起了至关重要的作用。据统计结果显示，２００３年至２００６年，全国建设占用耕地面积为６８．０万公顷，年均占用１７．０万公顷。①已有研究表明：我国农地非农化的波动周期与经济发展周期紧密一致，各省区之间ＧＤＰ总量与建设占用耕地之间存在幂指数的关系（陈江龙，２００３；曲福田，２００６）。同时，非农建设用地的迅速扩张，导致我国耕地面积大量减少，并存在耕地污染、生态环境恶化、失地农民增加等现象。目前全国失地农民有３５００万人左右，按目前的土地征用速度，预计到２０３０年失地农民将增至１．１亿人。②可见，基于我国人口众多、土地资源相对稀缺的基本国情，随着工业化、城市化进程的深入推进，且面临着日益紧张而残酷的国际与区域竞争，大量土地资源迅速地向非农用途转移将使得国家粮食有效供给、社会稳定、生态环境安全和经济可持续发展存在严重隐忧。土地利用规划是政府优化土地资源配置的重要手段之一。自１９９０年以来，我国先后开展了两轮全国土地利用总体规划的编制。第一轮规划（１９９０）在执行过程中被终止，对规划思想、目标和方法进行了大的调整后，重新编制了第二轮规划（１９９７）。虽然第二轮规划在耕地保护和控制建设用地盲目扩张等方面发挥了积极的作用，但随着规划编制社会经济背景的快速变化，规划实施中出现了用地指标不断被突破、规划方案调整频繁、违规用地现象普遍、圈地现象严重等一系列问题（国土资源部耕地保护与经济发展调研组，２００２；张跃进，２００４；中国国土资源报，２００６），而土地利用规划的预期目标并没有得到实现。这些事实促使专家、学者对我国土地利用规划实施状况进行理论反思：我国土地用途分区管制、基本农田保护等制度的确立以及土地利用总体规划的编制与实施，虽然在空间、时间和数量上对土地资源配置建立了法律体系，但非农建设用地的扩张、耕地过度消耗、利益主体之间矛盾冲突的势头仍得不到有效控制，其核心是土地利用规划中集体理性与个体理性的冲突，不同利益主体之间的利益机制缺失（师学义，２００５）。因此，本文在现有土地利用规划理论研究评述基础上，运用制度经济学理论与方法，从制度分析视角构建我国转型时期土地利用规划的制度分析框架，并阐释其重要的理论命题，为系统深入地分析我国转型时期土地利用规划的有效实施提供理论基础。一、现有研究视角的评述土地利用规划是一项涉及多学科、多部门、多时序的系统工程。各领域专家从不同的角度对规划的内涵有不同的解释，致使土地利用规划研究产生了不同的视角和范畴，其研究成果也涉及科学、工程技术、信息技术以及土地利用规划决策、规划政策、规划管理、规划实施等方面。然而，随着经济日趋全球化、社会依赖程度日益复杂多样化，市场机制难以有效地解决日益突出的人口、资源和环境问题时，在可持续发展观影响下的各国土地利用规划逐步由技术主导研究转向与社会经济发展密切相关的土地利用规划的制度与政策研究中。当前国际上对土地利用规划理论研究的一个重要角度就是运用经济学、政治学和社会学的理论研究成果，解释规划的 “政治性”和实际规划工作中的大量政治问题（Ｂｅｔｔｓ，１９９５；Ｄｏ－ｍｉｎｉｔｚ和Ｍａｎｓｋｉ，１９９６）。尽管它们的研究方法各异，研究重点也各有侧重，但都获得了一致的研究结论，即规划理论的核心组成部分主要包括规划过程和社会结果两个方面，且这二者处于互动之中，不同的社会政治经济背景将决定规划的开展，而规划的制定和实施及其空间后果将反过来影响社会政治经济本土地利用规划的制度分析框架与理论命题陈丽１，曲福田２，严金明１（１．中国人民大学公共管理学院，北京１００８７２；２．南京农业大学中国土地问题研究中心，江苏南京２１００９５）【摘要】文章在现有土地利用规划理论研究述评基础上，阐释土地利用规划与土地利用规划制度，从制度经济学分析视角，以土地利用规划的制度供需、成本和博弈分析为核心构建了我国转型时期土地利用规划的制度分析框架，提出这一研究框架下三个重要理论命题，即土地利用规划的制度供给与需求、土地利用规划决策与实施和土地利用规划管制权限配置。这为我国转型时期土地利用规划理论研究提供了一个新视角。【关键词】土地利用规划；制度；分析框架；理论命题【中图分类号】Ｆ２９３．２【文献标识码】Ａ【文章编号】１００４－２７６８（２００８）１６－００５５－０３【收稿日期】２００８－０２－２２【基金项目】教育部哲学社会科学研究重大课题攻关项目（０４ＪＺＤ０００８）；国家自然科学基金（７０１７３０２７） ①根据中华人民共和国国家统计局发布的《中华人民共和国国民经济和社会发展统计公报（２００３－２００６年）》整理获得。②包宗顺：《重视失地农民利益的保护》，新华日报，２００３年７月１３日。【作者简介】陈丽（１９８０－），女，山西太原人，中国人民大学公共管理学院博士后研究人员，研究方向：土地资源持续利用与规划；曲福田（１９６２－），男，山东莱州人，南京农业大学中国土地问题研究中心教授、博士生导师，研究方向：资源经济、土地政策；严金明（１９６５－），男，江苏南通人，中国人民大学公共管理学院教授、博士生导师，研究方向：土地利用规划与政策。《生产力研究》专题研究.生产力研究/Ｎｏ．１６．２００８! "

论文框架结构.

论文框架结构特别说明:以下的框架只是普遍意义的写作思路,各位同学在写作时,可以按照此框架整理自己的思路,具体拟订写作提纲时,应根据实际情况进行增减和调整,不一定完全按照此框架写。对于不同类型的选题,论文的框架各有不同,一般而言,电大论文选题可以分为三种类型。第一种类型:以研究企业现存问题为主,对存在问题提出改进建议的选题。重点写存在的问题接解决措施。在文章的开头应该有 300-600字左右的引言,引出所研究的内容, 对于问题型研究主题,主要内容的展开可以参考以下分析框架: 一、理论概述 (一 (二 ······ 二、提出问题(研究对象的特点、重要性、必要性 (一 (二 (三 ······· 三、现状描述(现状,现存主要问题

(一 (二 (三 ······· 四、问题分析(原因分析、成因分析 (一 (二 (三五、解决问题(解决措施、改进建议、应注意的问题 (一 (二 (三 ······· 六、小结(结论第二种类型:某一种管理方式在某企业中的应用, 如“电子商务在 XX 企业中应用研究” 。重点写实施过程中面临的主要问题及解决方案在文章的开头应该有 300-600字左右的引言,引出所研究的内容, 对于问题型研究主题,主要内容的展开可以参考以下分析框架: 一、理论概述

(一 (二 ······ 二、研究对象的必要性 (一 (二 (三 ······· 三、推行的主要措施 (一 (二 (三 ······· 四、推行过程中面临的主要问题 (一 (二 ······· 五、解决措施、保障措施或进一步注意的问题(一

(二 (三 ······· 六、小结(结论第三种类型:以总结企业现有的成功经验为主(相当于一个案例研究 ,该企业的成功经验对类似企业的借鉴意义,如“雅芳直销模式的研究” ,这类企业的一些做法比较成熟、成功,具有一定的推广和借鉴价值。重点要总结经验, 提出其借鉴意义。在文章的开头应该有 300-600字左右的引言,引出所研究的内容, 对于问题型研究主题,主要内容的展开可以参考以下分析框架: 一、理论概述 (一 (二 ······ 二、研究意义、重要性、必要性 (一 (二 (三 ······· 三、主要措施或举措

马克思经济学的制度框架分析.(一)

马克思经济学的制度框架分析.(一) 内容摘要：生产关系是马克思经济学的一个重要研究对象，而对生产关系的研究本身必然涉及在此基础上产生的各种经济制度以及与之相适应的政治、法律、意识形态等制度体系。因此，可以在该种意义上将马克思经济学视为一种制度经济学。本文在理解马克思关于制度分析的一些具体理论命题的基础上，试图剖析马克思经济学制度分析的一般结构，从方法论、制度的起源、制度范畴的界定、制度的本质、制度结构、制度变迁机制几方面分析马克思的制度理论。关键词：马克思经济学生产力生产关系生产关系是马克思经济学的一个重要研究对象，而对生产关系的研究本身必然涉及在此基础上产生的各种经济制度以及与之相适应的政治、法律、意识形态等制度体系。因此，可以在该种意义上将马克思经济学视为一种制度经济学。制度框架分析（一）方法论马克思认为，人在本质上是一定“社会关系的总和”，涉及到的人只是“经济范畴的人格化”。虽然人的行为具有社会性和客观性，但个人在经济活动中具有能动性，马克思明确拒绝了个体与制度环境的决定论解释，批判那种“认为人是环境和教育的产物”的机械唯物主义观点，他认为这种学说忘记了“环境正是由人来改变的，而教育者本人一定是受教育的。” 因此，马克思经济学制度分析的方法论是辩证法和历史唯物主义，从社会整体制约中分析个体经济行为，从生产力与生产关系矛盾运动中解释社会制度的演变，揭示了制度变迁的历史客观性与制度变迁过程中的主观能动性。马克思制度理论所运用的范畴主要有：生产力、生产方式、生产关系、分工、经济基础、上层建筑、意识形态、生产资料所有制、社会实践、劳动等。（二）制度分析的切入点在马克思看来，要揭开人类社会制度演变之迷，就必须首先确立历史起点、历史前提，而这一起点或前提就是生产这一人类首要的实践活动，这是因为：“人们为了能够‘创造’历史，必须能够生活。但是为了生活，首先就需要吃喝住穿以及其他一些东西。因此第一个历史活动就是生产满足这些需要的资料，即生产物质生活本身，而且这是这样的历史活动，一切历史的基本条件。……因此任何历史观的第一件事情就是必须注意上述基本事实的全部意义和全部范围，并给予应有的重视。”由以上论述我们可以看出对于制度分析的切入点应当是满足人们生存需要的物质生产，马克思正是在理解人与自然的关系中把握人与人之间的关系，从而揭开人类历史发展之迷的。（三）制度的起源按照马克思的解释：在原始社会之初，由于生产力水平极其低下，个人从自然界获取生存资料的能力十分低下，不得不以群体的方式生存，个体还不具备脱离群体生存的条件。在这种社会形态中，由于不存在经常性的超过生存需要的剩余产品，群体内部利益是一致的，实行的是生产资料公有制。只是随着生产力水平的提高，才有了经常性的剩余产品，逐渐具备了个体脱离群体生存的条件。这时，产生了社会分工与生产资料私有制，群体内部利益发生分化，产生了对经济资源具有不同支配力的集团和阶级。在社会分工体系中处于支配地位的集团或阶级，为维护有利于自身的既定利益分配格局，依靠自己在资源占有上的优势，建立起了国家等强力组织和政治、法律制度，同时建立了有利于自身统治的意识形态。从以上论述中可以看出，马克思是从生产力发展的角度解释了人类第一层制度的起源，即社会生产关系的形成（经济制度的形成）；进而又从社会生产关系中导出第二层制度的起源，即包括政治、法律、道德规范、意识形态等在内的上层建筑。（四）制度范畴与制度的本质

论文框架结构

论文框架结构 Document serial number【KKGB-LBS98YT-BS8CB-BSUT-BST108】

论文框架结构特别说明：以下的框架只是普遍意义的写作思路，各位同学在写作时，可以按照此框架整理自己的思路，具体拟订写作提纲时，应根据实际情况进行增减和调整，不一定完全按照此框架写。对于不同类型的选题，论文的框架各有不同，一般而言，电大论文选题可以分为三种类型。第一种类型：以研究企业现存问题为主，对存在问题提出改进建议的选题。重点写存在的问题接解决措施。在文章的开头应该有300－600字左右的引言，引出所研究的内容，对于问题型研究主题，主要内容的展开可以参考以下分析框架：一、理论概述（一）（二） ······ 二、提出问题（研究对象的特点、重要性、必要性）（一）（二）（三） ······· 三、现状描述（现状，现存主要问题）（一）（二）（三） ······· 四、问题分析（原因分析、成因分析）（一）（二）（三） ·······

五、解决问题（解决措施、改进建议、应注意的问题）（一）（二）（三） ······· 六、小结（结论）第二种类型：某一种管理方式在某企业中的应用，如“电子商务在XX企业中应用研究”。重点写实施过程中面临的主要问题及解决方案在文章的开头应该有300－600字左右的引言，引出所研究的内容，对于问题型研究主题，主要内容的展开可以参考以下分析框架：一、理论概述（一）（二） ······ 二、研究对象的必要性（一）（二）（三） ······· 三、推行的主要措施（一）（二）（三） ······· 四、推行过程中面临的主要问题（一）（二）（三） ······· 五、解决措施、保障措施或进一步注意的问题

企业管理制度框架最新版

企业管理制度框架一、总则二、行政日常工作规范 1.行政协调工作规范 2.行政沟通工作规范 3.行政经费管理工作规范三、企业行政办公室事务管理（一）行政办公室事务管理规范化制度 1.办公室布置规范 2.办公室用品管理制度 3.办公室用品、文具管理制度 4.工作服管理制度（二）行政办公室事务管理实用表单 1.文具用品一览表 2.办公室用品需求计划表四、企业员工行为规范管理（一）企业员工行为规范管理化制度 1.员工守则 2.企业员工仪容仪表规范 3.企业员工礼仪举止规范 4.企业员工言行规范 5.企业员工电话规范

（二）企业员工行为规范管理实用表单 1.自我报告表 2.自我评价表 3.目标工作单 4.主要计划表 5.工作计划6W2H分析表 6.工作记录表五、企业人事行政管理（一）企业人事行政管理规范化制度 1.企业人事管理刚要 2.人员招聘管理制度 3.员工培训管理方法 4.员工工作调动管理办法 5.员工辞职审批管理办法 6.劳动合同（二）企业人事行政管理实用表单 1.企业人事规划表 2.人员补充申请表 3.应聘登记表 4.员工培训计划表 5.工作调动申请表 6.辞职申请表

7.辞退通知单（三）企业人事行政管理规范化细节执行标准六、企业会议管理 1.会议管理工作要点 2.会议管理规范化制度 3.会议管理使用表单 4.会议管理规范化细节执行标准七、企业提案管理 1.企业提案管理规范化制度 2.企业提案管理实用表单八、企业文书管理（一）企业文书管理规范化制度 1.企业公文收发规定 2.企业邮件、函电收发制度 3.图书管理制度（二）企业文书管理实用表单九、企业档案管理（一）企业档案管理规范化制度 1.人事档案利用制度 2.文书档案归档制度 3.文件、材料收集归档制度（二）企业管理规范化细节执行标准

论文框架结构介绍

论文框架结构特别说明：以下的框架只是普遍意义的写作思路和基本框架，各位同学在写作时，可以按照此框架整理自己的思路，具体拟订写作提纲时，应根据实际情况进行增减和调整，不一定完全按照此框架写。对于不同类型的选题，论文的框架各有不同，一般而言，电大论文选题可以分为三种类型。第一种类型：以研究企业现存问题为主，对存在问题提出改进建议的选题。重点写存在的问题接解决措施。对于问题型研究主题，主要内容的展开可以参考以下分析框架：引言（一）（二）?…… 二、XX 公司XX 管理的重要性（研究对象的特点、必要性）（一）（二）（三） ?…… 三、XX 公司XX 管理的现状此处可以加一小段文字对所研究企业进行简单介绍，不必单独列出二级标题（一）（二）（三）?…… 四、XX 公司XX 管理中存在的主要问题此处可以加一小段文字对所研究企业进行简单介绍，不必单独列出二级标题

（二）（三） ?…… （一）（二）（三） ?…… 六、解决措施（解决对策、改进建议、在XX 管理中应注意的问题）（一）（二）（三） ?…… 七、小结（结论）第二种类型：某一种管理方式在某企业中的应用，如“电子商务在XX 企业中应用研究”。重点写实施过程中面临的主要问题及解决方案。对于问题型研究主题，主要内容的展开可以参考以下分析框架：引言（在文章的开头应该有300－600字左右的一小段引言，引出所研究的内容，可不要单独列一级标题）一、理论概述（一）（二） ?…… 二、XX 管理在XX 企业推行的必要性（重要性、迫切性）（一）（二）（三）

三、推行的主要措施此处可以加一小段文字对所研究企业进行简单介绍，不必单独列出二级标题（一）（二）（三） ?…… 四、推行过程中面临的主要问题（一）（二）（三） ?…… 五、解决措施（保障措施或进一步注意的问题）（一）（二）（三） ?…… 六、小结（结论）第三种类型：以总结企业现有的成功经验为主（相当于一个案例研究），该企业的成功经验对类似企业的借鉴意义，如“雅芳直销模式的研究”，这类企业的一些做法比较成熟、成功，具有一定的推广和借鉴价值。重点要总结经验，提出其借鉴意义。对于问题型研究主题，主要内容的展开可以参考以下分析框架：引言（在文章的开头应该有300－600字左右的一小段引言，引出所研究的内容，可不要单独列一级标题）一、理论概述（一）（二）

中国的政策过程理论分析框架

中国的政策过程理论分析框架一、中国的政策过程中国政策过程的一个重要特征是对“共识”的目标诉求。共识就是意见一致。我国的宪政制度、政府体系以及党政关系均将共识视为政策目标。在政策过程的实践中，共识诉求甚至超越单纯的理性目标，如用政策方案的“可批性”代替“可行性”的情形。共识可以通过多种方式达成，如指令、协商、竞争等。政策过程由政策舞台、政策参与者和共识过程三个变量构成。政策舞台即政策制定的政府部门、机构或非正式组织。政策参与者指以直接或间接方式影响政策制定的行为个体。共识过程即意见收敛的过程，在政策研究中，共识过程与信息流动的方向、官僚组织的层级、参与政策制定的组织及网络等均相关。政策过程的模式受到制度环境的显著影响。制度环境包括政府组织结构、资源配置方式、产权结构和意识形态等变量。政府组织结构决定了政策制定的舞台、程序和规则，并规定了政策参与者之间的权力关系。资源配置方式和产权结构决定了政策参与者的利益格局和政策立场。意识形态决定了政策参与者的政策信仰和政策偏好。由于不同的政策参与者在权力关系、利益格局和政策偏好上不可避免地存在差异性，政策压力和冲突随即形成，政策过程就是缓解政策压力、解决政策冲突的共识形成过程

政策舞台的制度化层面就是政策提出、酝酿、构思和决策的官僚体系。官僚体系是政策舞台的制度化层面就是政策提出、酝酿、构思和决策的官僚体系。官僚体系是政策过程的组织基础和制度框架，它规定了各个部门的机构设置、隶属关系和职能分工，从而限定了政策在哪些政府部门制定、经由何种程序和规则、最后以何种方式颁布和执行。制度化层面上的官僚体系提供政策过程的正式规则，其政策压力和意见传达、收敛、并达成共识的过程通常是自下而上的、程序化的过程; 政策舞台的社会化层面就是参与或影响政策过程的机构、团体或个人所组成的非正式关系网络，即协商网络。揭示了官僚组织和正式制度背后的动态因素，如哪些人在积极推动、其影响力的来源、作用途径和影响强度等。协商网络因政策领域不同而呈现出较大差异，通常根据行为主体参与程度的差异分成不同的层次，即决策层、酝酿层、影响层，形成由内及外渐次扩散的涟漪。社会化层面上的政策协商网络则为政策过程提供了潜在的、关键的动力，其政策压力或意见的传达、收敛和共识形成的模式通常是自上而下的、间断性的过程。由于制度化层面上的政策过程缺乏动力和自发收敛的可能性，而社会化层面上的政策过程又缺乏相应的合法性，因此，两者的有效互动才能够形成最终的政策共识。

马克思经济学的制度框架分析(一)

马克思经济学的制度框架分析(一) 生产关系是马克思经济学的一个重要研究对象，而对生产关系的研究本身必然涉及在此基础上产生的各种经济制度以及与之相适应的政治、法律、意识形态等制度体系。因此，可以在该种意义上将马克思经济学视为一种制度经济学。制度框架分析（一）方法论马克思认为，人在本质上是一定“社会关系的总和”，涉及到的人只是“经济范畴的人格化”。虽然人的行为具有社会性和客观性，但个人在经济活动中具有能动性，马克思明确拒绝了个体与制度环境的决定论解释，批判那种“认为人是环境和教育的产物”的机械唯物主义观点，他认为这种学说忘记了“环境正是由人来改变的，而教育者本人一定是受教育的。” 因此，马克思经济学制度分析的方法论是辩证法和历史唯物主义，从社会整体制约中分析个体经济行为，从生产力与生产关系矛盾运动中解释社会制度的演变，揭示了制度变迁的历史客观性与制度变迁过程中的主观能动性。马克思制度理论所运用的范畴主要有：生产力、生产方式、生产关系、分工、经济基础、上层建筑、意识形态、生产资料所有制、社会实践、劳动等。（二）制度分析的切入点在马克思看来，要揭开人类社会制度演变之迷，就必须首先确立历史起点、历史前提，而这一起点或前提就是生产这一人类首要的实践活动，这是因为：“人们为了能够‘创造’历史，必须能够生活。但是为了生活，首先就需要吃喝住穿以及其他一些东西。因此第一个历史活动就是生产满足这些需要的资料，即生产物质生活本身，而且这是这样的历史活动，一切历史的基本条件。……因此任何历史观的第一件事情就是必须注意上述基本事实的全部意义和全部范围，并给予应有的重视。”由以上论述我们可以看出对于制度分析的切入点应当是满足人们生存需要的物质生产，马克思正是在理解人与自然的关系中把握人与人之间的关系，从而揭开人类历史发展之迷的。（三）制度的起源按照马克思的解释：在原始社会之初，由于生产力水平极其低下，个人从自然界获取生存资料的能力十分低下，不得不以群体的方式生存，个体还不具备脱离群体生存的条件。在这种社会形态中，由于不存在经常性的超过生存需要的剩余产品，群体内部利益是一致的，实行的是生产资料公有制。只是随着生产力水平的提高，才有了经常性的剩余产品，逐渐具备了个体脱离群体生存的条件。这时，产生了社会分工与生产资料私有制，群体内部利益发生分化，产生了对经济资源具有不同支配力的集团和阶级。在社会分工体系中处于支配地位的集团或阶级，为维护有利于自身的既定利益分配格局，依靠自己在资源占有上的优势，建立起了国家等强力组织和政治、法律制度，同时建立了有利于自身统治的意识形态。从以上论述中可以看出，马克思是从生产力发展的角度解释了人类第一层制度的起源，即社会生产关系的形成（经济制度的形成）；进而又从社会生产关系中导出第二层制度的起源，即包括政治、法律、道德规范、意识形态等在内的上层建筑。（四）制度范畴与制度的本质在马克思的理论体系中，制度最初来自于物质生产条件，之后才上升到政治、法律层次。因此，制度范畴在马克思经济学说中包括作为经济制度的生产关系和作为上层建筑的与经济制度相适应的政治、法律、意识形态等制度体系两个层面。前者可以看作是一种仅限于经济关系领域内的狭义的制度，后者则可以被视为一种广义的制度。完整的社会制度是由经济基础和上层建筑这两个相互联系的层次组成的，二者之间是决定与反作用的关系，即：经济基础决定上层建筑，上层建筑反作用于经济基础。由于物质生产在人类社会生活中的基础地位，所以对社会制度进行研究的逻辑次序应当是，首先必须分析作为整个社会制度经济基础的生

【最新】制度分析与发展框架-精选word文档 (16页)

本文部分内容来自网络整理所得，本司不为其真实性负责，如有异议或侵权请及时联系，本司将立即予以删除！ == 本文为word格式，下载后可方便编辑修改文字！ == 制度分析与发展框架制度分析与发展框架研究她在理论上给出了解决“集体行动困境”、“公地悲剧”问题在国家理论和企业理论之外可能有的另外选择，给出了运用于研究的制度分析的方法。作为总的组织工具，制度分析与发展框架致是一个关于应用规则、自然和物质条件以及群体属性如何影响行动舞台的结构、个体面对的激励以及结果产生，它为公共池塘资源问题开展长期研究提供了方便。本文则从理论的角度介绍制度分析与发展框架的发展。关键词：制度分析与发展框架行动情景行动者运行规则一、制度分析与发展框架总述 (一) 问题意识埃莉诺·奥斯特罗姆在对大都市区的警察服务进行研究后，提出了“多中心政治体制的概念”，其基本的思想就是集体中的个人在面临集体选择问题时应该能够以他们认为合适的方式来解决这些问题。这启发了埃莉诺将多中心理论作为市场理论和国家理论之外的替代选择，用于分析和判断与现实中大量集体物品相适应的各种制度安排。埃莉诺便将研究的中心放到公共池塘资源上，致力于研究和开发一种理解复杂的人类社会互动的过程的分析框架。 (樊晓娇，201X)这即是其建立并逐步完善的制度分析与发展框架。

总之，埃莉诺力求提供一个能够贯穿于政治学和经济学的关于“制度”的定义，发展一个共同的框架来研究包括立法机构、公共部门、市场和其他组织在内的复杂的政治经济学，以助于把政治学家、经济学家、人类学家、社会学家、社会心理学家和其他学者就“制度如何影响个人面临的诸多激励以及个人选择这”一课题所研究结合为一体。 (二)理论根基埃莉诺·奥斯特罗姆认为，制度分析与可持续发展有多重历史根基，在古典政治经济学、新古典微观经济学、制度经济学、公共选择理论、交易成本经济学以及非合作博弈论都可可以找到它的理论根基。 (埃莉诺，1994)几十年来，这框架影响不同问题和区域的相关问题的研究。它可以诊断与分析问题，对静态和动态制度安排进行研究。分析者可以根据决策环境的具体背景，利用这个框架来考察行动情景内的行为与结果。 (三)框架逻辑制度分析与可持续发展框架主要是首先划分制度的层次结构;其次，是确认行动场景中行动情景和行动者各自所包含的变量，根据一定的评估规则来评价这些变量之间相互作用的模式和结果。最后通过在行动场景上的行动情景和行动者的相互作用的结果产生的结果来分析结果与行场景、行动情景、行动者各变量之间的关系和三大变量即物的世界、社群以及应用规则如何影响行动场景的特征的。二、行动情景在利用制度分析与可持续发展框架时，无论从何种方法或何种分析层次入手，都必须首先明确一个概念单位，即所谓的行动场景。 “它是特定约束条件下分析、预测并解释人类行为与结果的关键。

三种论文框架结构介绍

论文框架结构特别说明：以下的框架只是普遍意义的写作思路和基本框架，各位同学在写作时，可以按照此框架整理自己的思路，具体拟订写作提纲时，应根据实际情况进行增减和调整，不一定完全按照此框架写。对于不同类型的选题，论文的框架各有不同，一般而言，电大论文选题可以分为三种类型。第一种类型：以研究企业现存问题为主，对存在问题提出改进建议的选题。重点写存在的问题接解决措施。对于问题型研究主题，主要内容的展开可以参考以下分析框架：三、XX公司XX管理的现状此处可以加一小段文字对所研究企业进行简单介绍，不必单独列出二级标题绍可以与“第四部分存在的问题”合并，在现存问题的标题下加一段企业概况和现状的描述。四、XX公司XX管理中存在的主要问题此处可以加一小段文字对所研究企业进行简单介绍，不必单独列出二级标题 “XX公司XX管理的现状”是对公司在此方面的一些现行管理举措、管理步骤、管理方式、或所取得的一些成效的介绍。如果不能归纳出几点，只是笼统、XX公司XX管理的重要性（研究对象的特点、必要性）

“存在的主要问题”和“原因分析”是文章的重点，一定要重点论述，采用文字、图表、数据等方法加以论证。 “原因分析”如能与“存在问题”很清楚地区分，可以单独列一部分写作；如果很难归纳出明确的“原因”，可以将第四部分与第五部分合并，标题列为：XX公司XX管理中现存的主要问题，在分析问题时附带讲讲原因。另外，“现存的主要问题”和“原因分析”究竟是什么，一定要在二级标题中明确概括，让读者在阅读二级标题时对这些问题有清晰的认识。六、解决措施（解决对策、改进建议、在XX管理中应注意的问题）解决措施（改进建议、应注意的问题）是文章的重点，一定要在二级标题中明确概括岀可实施的、具体的措施，让读者在阅读二级标题时对这些措施有清晰的认识。另外，现存的问题和解决对策不是一定要机械的一一对应，某一问题的解决措施可能要从多方面着手。七、小结（结论）第二种类型：某一种管理方式在某企业中的应用，如“电子商务在XX企业中应用研究”。重点写实施过程中面临的主要问题及解决方案。对于问题型研究主题，主要内容的展开可以参考以下分析框架：引言（在文章的开头应该有300—600字左右的一小段引言，引出所研究的内容，可不要单独列一级标题）一、理论概述、XX管理在XX企业推行的必要性（重要性、迫切性）五、原因分析

制度分析模型

制度分析模型（P318) 界定：康芒斯：制度是集体行动控制个人行为的一系列行为准则或标准。科斯：制度就是指一系列关于产权安排、调整的规则，制度就是“规则”或组织形式。诺斯：制度是为约束在谋求财富和本人效益最大化中的个人行为而制定的一组规章、依循程序和伦理道德行为准则。 1、旧制度主义：旧制度主义关注的更多的是制度的属性，以及制度如何使个人的行为变得更好。主旨和目标：如何使个人行为朝向有利于集体目标的方向。重点：制度的规范性导向，以及制度对社会的影响。诟病所在：旧制度主义的目标是规范性的，它致力于在限定的政治系统中追求好的制度，相当程度上是以国家为中心的，其主要的研究领域是公共政策的制定和形成，它侧重于对立法、行政、司法等制度和机构进行相对机械的研究。评价：旧制度主义更多的是以国家为导向，侧重于对宏观层面的制度进行研究，对运作中的制度及其操作规则缺乏一种动态的视觉进行考察，抹杀了个人行为和人类活动的能动性。 2、新制度主义新制度主义制度沿用旧制度主义的一些假设，但在研究工具和理论关注上吸收了行为主义和理性选择分析的要素，从而丰富了旧制度主义的研究内涵。在一定程度上改变了旧制度主义中以国家为中心的研究倾向。新制度主义着力对制度与公共政策的关系进行探讨，把制度当成一个变量，并着重探讨不同的制度安排对于公共政策的影响是如何的不同。奥斯特罗姆的制度分析与发展框架，区分了操作、集体选择与立宪选择三个相互作用的层次或领域。多制度的中心安排：主张让大、中、小幅规模的政府和非政府的企业相互竞争，又相互合作。新制度主义事实上是旧制度主义与理性主义、行为主义互相影响和渗透而发展形成的新的研究途径。精英分析模型（P321）核心观点：公共政策是统治精英的偏好和价值体现，大众在相当程度上是被精英所操纵的核心观点：公共政策是统治精英的偏好和价值体现，大众在相当程度上是被精英所操纵的

毕业论文框架范例

论文结构（）论文题目：基于价值工程的产品技术开发研究摘要：当今社会，国内的企业间的竞争形势日益严峻，激烈的竞争环境要求企业必须获取并拥有自身的竞争优势。技术开发作为企业的一种核心资源，企业能否在产品的技术开发研究上取得优势对企业的生存发展有重大意义。基于价值工程的产品技术研究，是通过对价值工程的理论体系的合理应用，已达到产品的功能创新和企业成本节约。将价值工程引入到企业的产品技术研究活动中，对提高我国企业产品技术研究方案决策的科学性和经济性方面有重要意义。关键词：价值工程技术研究工艺改进方案决策引言： 1）研究背景阐述价值工程在国内外发展现状，以及国内外企业产品技术研究现状，再综述将价值工程理论应用于产品的技术研究的状况。概述起前景及重要作用 2）研究目的通过本次研究来探求应用价值工程理论指导企业的技术获取方案决策。通过功能价值分析研究技术获取方案的功能及其对应成本的关系，来提高企业技术获取方案决策的科学性和经济性。 3）研究的内容，思路及方法关于价值工程和产品技术研究理论基础 1）价值工程理论 2）产品技术研究方面的相关理论 3）将价值工程理论运用到产品技术研究中的可行性及必要性价值工程在产品技术研究决策中的应用 1）产品技术的功能分析 2）产品技术的功能评价指标 3）价值评价

基于价值工程在某面粉生产企业案例分析1）原有技术方案的详细阐述 2）对原技术进行功能评价 3）对原技术进行成本评价 4）对原技术价值系数计算 5）提出改进方案 6）对改进技术的功能评价 7）对改进技术进行成本评价 8）对先有技术进行价值系数评价 9）新技术的前景及经济性科学性结论参考文献致谢