当前位置：文档库 › Communication

Communication

Discrete Applied Mathematics154(2006)6–14

https://www.wendangku.net/doc/f216158264.html,/locate/dam

Communication

A class of heuristics for the constrained forest

problem

Michael Laszlo?,Sumitra Mukherjee

Nova Southeastern University,Graduate School of Computer and Information Sciences3301College Avenue,

https://www.wendangku.net/doc/f216158264.html,uderdale,FL33314,USA

Received6June2005;accepted29June2005

Available online31August2005

Communicated by F.Maf?oli

Abstract

The constrained forest problem seeks a minimum-weight spanning forest in an undirected edge-weighted graph such that each tree spans at least a speci?ed number of vertices.We present a structured class of greedy heuristics for this NP-hard problem,and identify the best heuristic.

Keywords:Constrained forest problem;Tree partitioning;Minimum spanning forests;Greedy heuristics

1.Introduction

Given a graph with positive edge weights,the constrained forest problem(CFP)seeks a minimum-weight spanning subgraph each of whose components contains at least m vertices. Such as subgraph is known as an m-forest.In their survey of graph partitioning problems, Cordone and Maf?oli[1]take note of the work of Imelie′n ska et al.[3]which shows that CFP is NP-hard for m 4,and presents a greedy heuristic for CFP with an approximation ratio of two[3].Recently,Laszlo and Mukherjee[4]present a second greedy heuristic that produces solutions at least as good as,and often better than,those produced by[3].The current paper characterizes a class of heuristics to which those presented in[3,4]belong as particular cases.

?Corresponding author.

E-mail address:mjl@https://www.wendangku.net/doc/f216158264.html,(https://www.wendangku.net/doc/f216158264.html,szlo).

doi:10.1016/j.dam.2005.06.006

https://www.wendangku.net/doc/f216158264.html,szlo,S.Mukherjee/Discrete Applied Mathematics154(2006)6–147 Section2presents a class of algorithms for constructing m-forests,and Section3char-acterizes these algorithms with respect to the size of solutions they produce.Section4 employs these algorithms as heuristics for CFP,and evaluates them in light of the results of Section3.

2.A class of greedy algorithms for constructing m-forests

A forest is called an m-forest if each of its trees contains at least m>0vertices.This section presents a class of algorithms each of which takes a tree T=(V,E)and constructs a spanning m-forest of T.The input to each algorithm is an ordered set E of the n edges of tree T and a positive integer m n+1,and the output is a subset of E representing a spanning m-forest.The algorithms are greedy insofar as they attempt to include only edges deemed essential.

Given two edges e and f of an ordered edge set in which e precedes f,we say that e is lighter than f and that f is heavier than e.We also refer to the?rst and last edges of the ordered set as its lightest and heaviest elements,respectively.A tree is said to be large if it contains at least m vertices,and small otherwise.

Each algorithm iteratively selects and removes a lightest or heaviest edge from edge set E and either includes the edge in the graph F under construction or rejects the edge.We can describe each algorithm as an edge-coloring process where an edge’s color indicates its status:

?An edge is blue if it belongs to edge set E.

?An edge is green if it has been added to F.

?An edge is red if it belongs to neither E nor F.

In short,green edges have been included,red edges have been excluded,and blue edges have not yet been considered.At any given time,we refer to the set of green edges as the green forest,which represents a subgraph of the solution that the algorithm eventually produces.

A green tree is a(maximal)component of the green forest.Similarly,we refer to the set of edges that are either blue or green as the blue-green forest,which represents a supergraph of the solution that will eventually be produced,and to each component of this forest as a blue-green tree.An edge is said to connect the two green(or blue-green)trees to which it is incident.

The edge-coloring process is as follows,where the procedure m-forest is called with an ordered edge set E of tree T:

m-forest(E){

for every edge e∈E,paint e blue;

while(E=?){

e←the lightest or heaviest edge in E;

E←E?e;

if(e is the lightest edge){

if(e connects two large trees in the green forest)

then paint e red;

https://www.wendangku.net/doc/f216158264.html,szlo,S.Mukherjee/Discrete Applied Mathematics154(2006)6–14

else paint e green;

}else{//e is the heaviest edge

if(e connects two large trees in the blue-green forest)

then paint e red;

else paint e green;

}

return the green forest;

}

The m-forest algorithm is nondeterministic since edge selection in each iteration is under-speci?ed.Note that the green forest is initially empty and grows over the course of the process,that the blue-green forest is initially the input edge set E and shrinks over the course of the process,and that the green forest is a subgraph of the blue-green forest.When the process terminates,every edge is green or red,and the green forest is the algorithm’s solution.

Theorem1.Where E is an ordered set of edges of a tree with at least m vertices,the m-forest algorithm returns an m-forest.

Proof.We show that the following loop-invariant is maintained:The blue-green forest is an m-forest.The initial blue-green forest is an m-forest since it is isomorphic to E,a tree of size at least m.For the inductive step,assume that the blue-green forest is an m-forest,and consider the two cases that can arise when an edge e is selected in an iteration of the while loop:

Case1:e is the lightest edge.The blue-green forest remains unchanged unless e is painted red,in which case its removal from the blue-green forest splits the blue-green tree to which e belonged into two blue-green trees T1and T2.Because e was painted red,it is incident to two large green trees,one of which must be a subgraph of T1and the other a subgraph of T2, implying that T1and T2are both large blue-green trees.Since no other tree in the blue-green forest is affected by removal of edge e,the blue-green forest remains an m-forest.

Case2:e is the heaviest edge.The blue-green forest remains unchanged unless e is painted red,in which case the blue-green tree to which e belonged is replaced by two large blue-green trees.Since no other tree in the blue-green forest is affected by removal of edge e,the blue-green forest remains an m-forest.

Hence we have shown the loop invariant that the blue-green forest remains an m-forest. When the algorithm terminates,every edge is green or red,and the green edges,which comprise the?nal blue-green forest,form an m-forest.

3.Structure of the class of algorithms

We can identify a speci?c heuristic from the class of heuristics represented by m-forest using a string w=w1w2...w n of length n=|E|over the alphabet{0,1}.The value of bit w i indicates which edge to choose in iteration i of the while loop:the lightest edge if w i=0,or the heaviest edge if w i=1.Under this scheme,we obtain a deterministic version

https://www.wendangku.net/doc/f216158264.html,szlo,S.Mukherjee/Discrete Applied Mathematics154(2006)6–149 of m-forest that gets called with an ordered edge set E and a string w that dictates the edge selection process:

deterministic-m-forest(E,w){

for every edge e∈E,paint e blue;

for i=1,2,...,|E|do{

if(w i=0){

e←the lightest edge in E;

E←E?e;

if(e connects two large trees in the green forest)

then paint e red;

else paint e green;

}else{//w i=1

e←the heaviest edge in E;

E←E?e;

if(e connects two large trees in the blue-green forest)

then paint e red;

else paint e green;

}

return the green forest;

}

As shorthand,we sometimes refer to the algorithm picked out by string w as the w heuristic. Moreover,we let G(w)and B(w)denote the green and blue forest,respectively,produced by running deterministic-m-forest on string w,where the ordered edge set E is assumed and |w| n.

The heuristic1n,which selects the heaviest edge in every iteration,is called the heaviest edge?rst(HEF)heuristic.Similarly,the heuristic0n,which selects the lightest edge in every iteration,is called the lightest edge?rst(LEF)heuristic.It is shown in[4]that,on any given input,the green forest produced by HEF is a subgraph of that produced by LEF.Our key theorem generalizes this result by characterizing the solutions produced by the class of all m-forest algorithms.Speci?cally,given two strings w= 1 and?w= 0 that differ in only one bit,the solution produced by the w heuristic is a subgraph of that produced by the ?w heuristic:

Flipped bit theorem.Let w= 1 and?w= 0 where|w|=|?w|=|E|.Then G(w)?G(?w). We prove this theorem in the appendix.Intuitively,when we delay processing a light edge,we increase the likelihood of excluding it and other edges from the m-forest un-der construction.In the precondition of the?ipped bit theorem,the light edge processed by the?w heuristic through pre?x 0is not processed by the w heuristic until later,in-creasing the likelihood that this and other edges are omitted from the green forest G(w).

The lattice structure.The2n binary strings of length n label the vertices of an n-dimensional hypercube in which two vertices are joined by an edge if and only if their

https://www.wendangku.net/doc/f216158264.html,szlo,S.Mukherjee/Discrete Applied Mathematics154(2006)6–14

labels differ in exactly one bit.This hypercube forms a complete lattice under the bitwise

Boolean operations and(∧)and or(∨):the greatest lower bound,or meet,of two strings w and?w is given by w∧?w,and least upper bound,or join,by w∨?w.The lattice’s least element is the string0n and its greatest element the string1n.

When considered under this lattice interpretation,the?ipped bit theorem and transitivity

of the subset relation imply the following corollary:

Corollary1.Let G and?G be the green forests produced by any two heuristics w and ?w,respectively,when applied to some edge set.The green forest produced by the w∧?w heuristic contains both G and?G as subgraphs.The green forest produced by the w∨?w heuristic is contained in both G and?G as a subgraph.

Since0n and1n are the lattice’s least and greatest elements,respectively,the green forest produced by the LEF heuristic0n contains both G and?G as subgraphs,and the green forest produced by the HEF heuristic1n is contained in both G and?G as a subgraph.The following result holds:

Corollary2.The solution produced by HEF is a subgraph of the solution produced by the w heuristic for every w.The solution produced by LEF is a supergraph of the solution produced by the w heuristic for every w.

4.Heuristics for the constrained forest problem

The constrained forest problem is de?ned as follows:Let G=(V,E)be an undirected connected graph with positive edge weights.Given a natural number m |V|,we seek a minimum-weight subgraph spanning the vertex set V such that each of its components spans at least m vertices.Each w heuristic can be used as a heuristic for CFP:Construct the minimum spanning tree(MST)of G and then apply the w heuristic to the MST’s edge set ordered by increasing weight.Corollary2implies that,of the class of heuristics considered in this paper,the heaviest edge?rst1n heuristic produces the best solutions for CFP. Given an MST with n edges,one can trace a path from0n to1n through the lattice of heuristics and generate the sequence of green forests G0,G1,...,G n produced by the heuristics along this path.(The forest G0results from the0n heuristic.The heuristic used to produce G i+1is obtained by setting one of the0bits to1in the string used to produce G i.) To examine this sequence of forests,we take the input edge set from a tree with positive edge weights and represent each forest G i by its total weight.Since G i+1?G i for each i,the total weight is monotonically nonincreasing.Fig.1graphs total weight for three different paths through the lattice from0n to1n:where1’s replace0’s from left to right (000...0,100...0,110...0,...),labeled Advancing in the?gure;where1’s replace0’s from right to left(0...000,0...001,0...011,...),labeled Receding;and for a random path from0n to1n.The edge set used to create these graphs forms the minimum spanning tree of the Creta data set[2]and contains1080edges.

In Fig.1,the graphs resulting from the Advancing and Receding paths bound the graph resulting from the random path.This illustrates a more general result that follows from

https://www.wendangku.net/doc/f216158264.html,szlo,S.Mukherjee /Discrete Applied Mathematics 154(2006)6–1411

860880

900

920

940960

980

1000

Number of 1s

T o t a l W e i g h t Fig.1.Total weight produced by three different paths in the heuristic lattice (m =4).

Lemma 3(see the appendix).For any 0 k n ,let G A and G R be the graphs produced by the 1k 0n ?k heuristic and 0n ?k 1k heuristic,respectively,and let G be the green graph produced by any string containing exactly k 1’s.Then G A ?G ?G R .

5.Conclusion

Our work has presented a class of greedy methods for constructing m-forests ,and char-acterized the structure of this class.These methods are applied to the constrained forest problem,and the 1n heuristic (HEF)is identi?ed as the best of this class of heuristics with approximation ratio two.

Appendix:Proof of the ?ipped bit theorem

In this appendix we prove the main theorem cited in this paper:

Flipped bit theorem.Let w = 1 and ?w = 0 where |w |=|?w |=|E |.Then G(w)?G(?w).In what follows,we assume that all strings are over the alphabet {0,1}and have length no greater than |E |.Given nonempty string w ,the last edge processed by running deterministic-m-forest on string w is denoted e =edge (w).Where w =w b ,edge e is processed as a light edge if bit b =0,or as a heavy edge if b =1.

https://www.wendangku.net/doc/f216158264.html,szlo,S.Mukherjee/Discrete Applied Mathematics154(2006)6–14

Assume G(w),B(w),and R(w)are the green,blue,and red forests,respectively,pro-

duced by(running m-forest on)string w.We de?ne the blue-green forest as BG(w)= B(w)∪G(w).We refer to graph components as trees,and sometimes refer to a green tree to emphasize that it belongs to G(w)for some w,and to blue and blue-green trees similarly.

A tree is said to be large if it contains at least m vertices,and small otherwise.

Lemma1.Let w= 01k and?w= 1k for any k 0.Let e denote the edge processed by the zero that follows the pre?x in w;that is,e=edge( 0).If w paints e red,then G(w)=G(?w). Proof of Lemma1.The proof is by induction on k.

[Basis]Assume k=0.Then G(w)=G( 0)=G( )=G(?w),where the second equality

follows from the fact that e is not added to the green forest G( ).

[Inductive step]Assume as the inductive hypothesis that the lemma holds for all i

Let f denote the heavy edge that is processed last;that is,f=edge(w)=edge(?w).By the

inductive hypothesis(IH),assume that the corresponding green forests are identical when

f is about to be processed;in other words,G( 01k?1)=G( 1k?1).There are two cases to consider:

[case1]?w paints f green:By the IH and the fact that w processes the same edges as?w,

except for edge e which w paints red,we have BG( 1k?1)=BG( 01k?1)∪e.Hence the two blue-green trees that f touches in BG( 01k?1)are no larger than their counterparts in BG( 1k?1),implying that w paints f green.

[case2]?w paints f red:Since the light edge e is painted red by w,edge e touches two large

green trees G1and G2in G( ).Both trees are subgraphs of the green forest G( 01k?1) and hence of the blue-green forest BG( 01k?1).Because BG( 1k?1)=BG( 01k?1)∪e, edge e separates two blue-green trees in BG(w),but the two blue-green forests BG(w)and BG(?w)are otherwise identical.Let T1and T2be the two blue-green trees that f touches in BG( 01k?1)when w is about to process edge f,and consider T1.If T1contains one of the green trees G1or G2as a subgraph,then T1is large since both G1and G2are large. Alternatively,if T1contains neither G1nor G2as a subgraph,then T1is one of the two large blue-green trees due to which?w paints f red.In both cases T1is large.The argument for T2 is the same,implying that f touches two large trees in the blue-green forest BG( 01k?1), implying that w paints f red.

Lemma2.Let w= and?w= where| |=| |.Suppose that G( )=G( )∪e and R( )=R( )?e for some light edge e painted red by w.Then G(?w)=G(w)∪e.

Proof of Lemma2.The proof is induction on| |.

[Basis]Assume| |=0.Then G(?w)=G( )=G( )∪e=G(w)∪e.

[Inductive step]Assume as the inductive hypothesis(IH)that the lemma holds for all proper pre?xes of ,in particular that G( )=G( )∪e where = b for bit b.Since w paints edge e red,e touches two large green trees in the green forest G( ).Let f=edge(w). There are two cases to show that f∈G(w)implies f∈G(?w):

[case1]f is a light edge(b=0).If w paints f green,f touches at least one small green tree T in G( ).Since T cannot be one of the green trees that e touches(for otherwise T would be large),IH implies that f touches the same tree T in G( ).Hence?w paints f green.

https://www.wendangku.net/doc/f216158264.html,szlo,S.Mukherjee/Discrete Applied Mathematics154(2006)6–1413 [case2]f is a heavy edge(b=1).If w paints f green,f touches at least one small blue-green tree T in BG( ).Since T cannot contain either green tree that e touches,IH implies that f touches the same tree T in BG( ),hence?w paints f green.

It follows from both cases that f∈G(w)implies f∈G(?w).The other direction,that f∈G(?w)implies f∈G(w),follows from the fact that G( )?G( ),which obtains from IH.Hence f∈G(w)if and only if f∈G(?w),and the lemma follows.

Lemma3.Let w= 1j0 and let?w= 01j .Let the edges processed by these two strings be labeled as follows:

e f1f2f3...

f k?1f k

?w: 0111 (11)

w: 1111 (10)

f1f2f3f4...f k e

Then the following conditions hold:

(a)if e∈G(w),then e∈G(?w);and

(b)each f i is painted the same color by w and?w;and

(c)either G(?w)=G(w)or G(?w)=G(w)∪e.

Lemma3expresses the idea that by postponing the processing of a light edge e,we obtain a green graph G(w)that is either the same as the green graph G(?w)we would otherwise obtain,or that excludes edge e which G(?w)includes.

Proof of Lemma3.(a)We show that if e is painted red by?w,then e is painted red by w.Assume e is painted red by?w.Then e touches two large green trees in graph G( ). Processing of the heavy edges f1through f k can only add more green edges to G( ),hence G( )?G( 1k).It follows that e touches two large green trees in G( 1k)and is painted red by w.

(b)There are two cases.First,assume?w paints edge e green.Then e remains in the blue-green forest implying that w and?w paint each edge f i the same color.Second,assume ?w paints edge e red.By Lemma1,w and?w paint each edge f i the same color.

(c)Part(a)of this lemma implies that one of three cases hold:e belongs to both G(w) and G(?w);e belongs to neither G(w)and G(?w);or e belongs to G(?w)?G(w).In the?rst two cases,part(b)of this lemma implies that G(w)=G(?w).In the last case,Lemma2 implies that G(w)=G(?w)or G(?w)=G(w)∪e.

Lemma4.Let w= 11k and?w= 01k where|w|=|?w|=|E|,and let e=edge( 0).Then either G(?w)=G(w)or G(?w)=G(w)∪e.

Proof of Lemma4.We can rewrite w as w= 11k?11= 11k?10since the last bit cor-responds to the last remaining blue edge in G,which can be regarded as light or heavy.We can also rewrite?w as?w= 01k?11.The lemma follows from Lemma3.

Flipped bit theorem.Let w= 1 and?w= 0 where|w|=|?w|=|E|.Then G(w)?G(?w).

https://www.wendangku.net/doc/f216158264.html,szlo,S.Mukherjee/Discrete Applied Mathematics154(2006)6–14

Proof of the?ipped bit theorem.Proof by induction on the number of zeros in string . [Basis]Where =1k for some k,the basis is established by Lemma4.

[Inductive step]Suppose =1k0 ,so w= 11k0 = 1k+10 and?w= 01k0 .Let string w = 01k1 .By Lemma3we have G(w )=G(w)or G(w )=G(w)∪e,hence G(w)?G(w ).By the induction hypothesis we have G(w )?G(?w).It follows that G(w)?G(?w)by transitivity of the subset relation.

References

[1]R.Cordone,F.Maf?oli,On the complexity of graph tree partition problems,Discrete Appl.Math.134(2004)

51–65.

[2]R.A.Dandekar,J.Domingo-Ferrer, F.Sebe,LHS-based hybrid microdata vs rank swapping and

microaggregation for numeric microdata protection,in:J.Domingo-Ferrer(Ed.),Inference Control in Statistical Databases,Lecture Notes in Computer Science,vol.2316,Springer,Berlin,2002.

[3]C.Imelie′n ska,B.Kalantari,L.Khachiyan,A greedy heuristic for a minimum weight forest problem,Oper.

Res.Lett.14(1993)65–71.

[4]https://www.wendangku.net/doc/f216158264.html,szlo,S.Mukherjee,Another greedy heuristic for the constrained forest problem,Oper.Res.Lett.33(6)

(2005),in press,10.1016/j.orl.2004.11.010.