文档库 最新最全的文档下载
当前位置:文档库 › On the Fitting and Consensus of Classification Systems

On the Fitting and Consensus of Classification Systems

On the Fitting and Consensus of Classification Systems
On the Fitting and Consensus of Classification Systems

On the Fitting and Consensus

of Classi?cation Systems

Bruno Leclerc

CAMS-EHESS

54bd Raspail

75270PARIS Cedex06,France

(e-mail:leclerc@ehess.fr)

Abstract.Classi?cation systems are families of subsets(classes)of a?xed set S that are closed for intersection and contain S and every single element subset of S. The main problem conidered here is that of the consensus of such systems.We?rst brie?y mention results issued from lattice theory.Then,we consider the Adams approach for the consensus of hierarchies and point out its relation with closures, implications(as they appear in relational databases)and nestings.We show that Adams consensus correspond to the research of a particular subdominant nesting (or overhanging)relation,and generalize the corresponding?tting problem. Keywords:Closure system,Classi?cation system,Implication,Overhanging or-der,Lattice,Hierarchy.

1Introduction

Let S be a?nite set.We consider here the aggregation of a pro?le F?=(F1,F2,...,F k)of classi?cations on S into a consensus classi?cation F=c(F?).A classi?cation will be here a family of subsets(classes) containing the whole set S and every one-element subset of S(singleton), and closed under intersection.Equivalently,classi?cation systems are the closure systems of the literature that include all the singletons.

There are two main purposes for the research of such a consensus.First, the classi?cation of a set S described by variables of di?erent types.Each qualitative or quantitative variable v induces a partition or a quasi-order on S,which in turn induces a classi?cation system.With such a common formalization for various structures,a set of k variables leads to a pro?le F?of k such systems.The idea is to aggregate the elements of F?into a unique system c(F?)that summarize the pro?le in some useful sense(see[Domenach and Leclerc,2004b]for more details).

The other reason is that several consensus problems already studied in the literature are particular cases of the consensus of closure systems.The basic example is provided by hierarchies,where,frequently in a purpose of phylo-genetic reconstruction,many works have followed those of[Adams III,1972] and[Margush and McMorris,1981](see the survey[Leclerc,1998]).Other

Consensus of Classi?cation Systems425 usual classi?cation models correspond,directly or after straightforward com-pletions,to closure systems.Thus,several other classical consensus problems are also particular cases,with restricted domains or codomains(or both),of the consensus of classi?cation systems.An example is the aggregation of partitions[R′e gnier,1965],[R′e gnier,1983],[Mirkin,1975],[Barth′e lemy and Leclerc,1995].

2Classi?cations and closure systems

Given a?nite set S,and its power set P(S),a classi?cation system on S is a family F?P(S)of classes(subsets)of S.A class C∈F may be a set of elements sharing some common properties,or close to each other in some sense.Then the following conditions,although not always required, may appear as natural ones:

(C1)S∈F;

(C2)C,C ∈F?C∩C ∈F;

(C3)for all s∈S,{s}∈C.

Then,from(C2)and(C3),we have the empty class in F.This property, although not usual,is appropriate to obtain structural coherence.A family F which satis?es only(C1)and(C2)is a so-called closure system(or Moore family).

The most usual classi?cation models correspond to such classi?cation systems,sometimes with the addition of some trivial classes.For instance, the addition of the empty class to a hierarchy H,or the addition of S,the empty set and the lacking singletons to a partition provide classi?cation systems.Pyramids(or quasi-hierarchies)and weak hierarchies,in their intersection-closed variants,are further examples.

We?nd in the literature three notions(among many others)which all are in one-to-one correspondence with closure systems(cf.[Caspard and Monjardet,2003]).

A closure operator?on S is a mapping on P(S)satisfying the three properties of isotony(for all A,B?S,A?

B implies?(A)??(B)), extensivity(for all A?S,A??(A))and idempotence(for all A?S,?(?(A))=?(A)).The elements of the image F?=?(P(S))of P(S)by ?are the closed(by?)subsets of S,and F?is a closure system on S. Conversely,a closure operator?F on S is associated to any closure system F on S by?F(A)= {F∈F:A?F}(i.e.,from(C1)and(C2),the smallest class of F containing A exists and is?F(A).

A complete implication system on S,denoted by I,→I or simply→,is a binary relation on P(S)satisfying,for all A,B,C,D?S:

426Leclerc

(I1)B?A implies A→B;

(I2)A→B and B→C imply A→C;

(I3)A→B and C→D imply A∪C→B∪D.

An overhanging order(nesting order in some contexts)on S is a binary relation on P(S)too,denoted as?and satisfying,for all A,B,C?S: (O1)A?B implies A?B;

(O2)A?B?C implies A?C??[A?B or B?C];

(O3)A?A∪B implies A∩B?B.

It is not di?cult to see that?is then a(partial)order on P(S).The sets of all closure systems,closure operators,complete implication systems and overhanging orders on S are respectively denoted as M,C,I and O.They are in one-to-one correspondence to each other.Besides the correspondence recalled above,we give hereunder two further correspondences,the?rst one due to[Armstrong,1974],and the second pointed out in[Domenach and Leclerc,2004]:for all A,B?S,

A→B??B??(A)

A?B??A?B and?(A)??(B)

So,in a classi?cation system,A→B means that every class including the subset A of S also includes B,while A?B means that B properly in-cludes A and,moreover,there exists at least one classs including A and not B.

Further conditions correspond to particular classes of systems.For in-stance,an overhanging order corresponds to a classi?cation system if and only if it satis?es the following condition(OS)below,and to a hierarchy if,moreover,the following condition(OT)replaces(O3)[Adams III,1986], [Domenach and Leclerc,2004]:for all A,B,C?S,s∈S,

(OS)s/∈A implies??{s}?A∪{s};

(OT)A?C and B?C imply A∪B?C or A∩B=?.

3Consensus in the lattice of closure systems

The sets M,C,I and O are naturally ordered:M by set inclusion on P(P(S)),I and O by set inclusion on P(P(S)×P(S))=P((P(S))2),C by the poinwise order on mappings:?≤? if?(A)?? (A)for all A?S.The resulting orderings are either isomorphic or dually isomorphic:if?,I and?(respectively? ,I and?’)are,respectively,the closure operator,complete implication system and overhanging order associated to a given closure sys-tem F(respectively to F ),one has F?F ??? ≤???I ?I????? (cf.[Caspard and Monjardet,2003]and[Domenach and Leclerc, 2004b]for the case of overhangings).

The sets M and I are closed under set intersection in,respectively, P(P(S))and P((P(S))2),and the set O is closed under set union in

Consensus of Classi?cation Systems 427

P ((P (S ))2).The greatest elements of M ,I and O are,respectively,P (S ),P (S ))2and {(A,B ):A,B ?S,A ?B },whereas their lowest elements are,respectively,{S },{(A,B ):A,B ?S,B ?A }and the empty relation on P (S ).So,M and I are themselves closure systems on,respectively,P (S )and P (S ))2).

Ordered by inclusion,any closure system F is a lattice (F ,∨,∩),with (F ∨F =?(F ∪F )for all closed subsets F,F ∈F .The existence of such a lattice structure has important consequences for the consensus problem as described above,that is the aggregation of any pro?le F ?=(F 1,F 2,...,F k )of closure systems into a closure system F =c (F ?).Previous results on the consensus in lattice structures may be found,among others,in [Monjardet,1990],[Barth′e lemy and M.F.,1991]and [Leclerc,1994],with signi?cant issues in particular cases like those of hierarchies ([Barth′e lemy et al.,1986]),partitions ([Barth′e lemy and Leclerc,1995])or orders ([Leclerc,2003]).Results for the particular case of closure systems are given in [Raderanirina,2001]and [Monjardet and Raderanirina,2004].

A federation on K is a family K of subsets of K ={1,...,k }satisfying

[L ∈K ,L ?L ]?[L ∈K ].We then de?ne a federation consensus function c K associated to the federation K by c K (F ?)= L ∈K ( i ∈L F i ).Especially,K is an oligarchic consensus function if K ={L ?K :L ?L 0}for a ?xed subset L 0of K .

Another class of consensus functions consists of the so-called quota rules c q =c K ,where K ={L ∈K :|L |≥q }for a given number q (0≤q ≤k ).Equivalently,c q (F ?)= {A ?S :|{i ∈K :A ∈F i }|≥q }is the closure system generated by those classes that are present in at least q of the F i ’s.Especially,for q =k ,the quota rule is the same as the oligarchie rule obtained with L 0=K .

The above de?nition of federation consensus functions needs the set K (and,so,the integer k )to be ?xed.Such a constraint is easily removed for quota rules by replacing the number q with a proportion (see [Barth′e lemy and M.F.,1991]).Note also that,if all the closure systems in F ?are classi?cation systems,then the federation consensus system c K (F ?)is is still a classi-?cation system,for any federation K .The same remark holds for quota rules.

An axiomatic approach (cf.[Day and McMorris,2003])of the consensus problem on M allowed to characterize oligarchic rules ([Raderanirina,2001]),whereas a metric approach,based on the symmetric di?erence metric ?on M de?ned by ?(F ,F )=|F F |leads to the following result [Leclerc,1994],where a median of F ?is a closure system M ∈M minimizing ρ(M ,F ?)= 1≤i ≤k ?(M ,F i ).

428Leclerc

Theorem.For any pro?le F?of M,and any median M of F?,the inclusion M?c k/2(F?)holds.

In other terms,any class of a median closure system belongs to at least

half of the closure systems of the pro?le.It is not di?cult to see that this result remains valid when considering classi?cation systems.

4A?tting result based on implications and

overhangings

Federation consensus functions c K take only in account the presence or absence of classes in a quali?ed part of the elements of a pro?le.But it has been observed,in the case of hierarchies,that we have there a limitation which can prevent us to recognize common features in the elements of the

pro?le,even evident ones.Moreover,there is a risk that a consensus based

on presence of entire classes lacks of interest.For instance,if no untrivial class(other than the empty class,the singletons,and S),appears in at least half of the elements of a pro?le,the approaches evoked in the previous section lead to a consensus classi?cation system with only the trivial classes,

that is providing no information.For reasons of this type,[Adams III, 1986]developed a consensus method on hierarchies based on intersection of classes,and caracterized it in terms of the overhanging orders(called there nestings)associated to the involved hierarchies.The following result is a generalization of an Adams one.It concerns the more general problem of

the?tting of an overhanging order to a given binary relationΞon P(S).

The only condition onΞis:(A,B)∈Ξimplies A?B.

For the proof of the next results,we need some further de?nitions on lattices,especially those of closed sets.First,given two closed sets C,C

in a closure system F,C is covered by C (denoted by C?C )if,for any

C ∈F,C?C ?C implies C =C or C =C .A closed set C is meet irreducible if it is covered by a unique closed set C+in F.These meet-irreducibles generate the whole closure system F,in the sense that every

C∈F is obtained as an intersection of such elements.Now,the covering relation of the closure system M is characterized as follows:for F,F ∈M,

F?F if and only if F=F ?{C}for some meet-irreducible C of F (cf. [Caspard and Monjardet,2003]).

Consider the following two properties of a closure system F and its over-hanging order?:

(AΞ1)Ξ??,(preservation ofΞ)

(AΞ2)for any meet-irreducible C of F,(C,C+)∈Ξ.(quali?ed overhangings) Theorem.Let F,F ∈M.If both F and F satisfy Conditions(AΞ1)and

(AΞ2),then F=F .

Consensus of Classi?cation Systems429 Proof.Observe?rst that the set S is in both F and F .If F=F , the symmetric di?erence F F is not empty.Let C be a maximal class in F F .Then,C=S and it may be assumed without loss of generality that C belongs to F(and,so,C does not belong to F ).If C was not a meet-irreducible element of F,it would be an intersection of meet-irreducibles,all belonging to both F and F and,so,C would belong to F .

Thus,C is a meet-irreducible,covered by a unique element C+of F, with C+∈F .By(AΞ2),(C,C+)∈Ξand,by(AΞ1),C? C+(where ? is the overhanging order associated to F ).Set C =? (C)(where? is the closure operator associated to F ).We have C?C ,since C∈F ,and C ? C+,since C =? (C)=? (C )?? (C+)=C+.But,according to the hypotheses,C?C implies C ∈F,with C?C ?C+,a contradiction with the hypothesis that C+covers C in F.

In the particular case where F1,F2,...,F k are hierarchies on S,andΞ=

1≤i≤k ?i(where,for all i=1,...,k,?i is the overhanging/nesting order

associated with F i),we?nd a result implying the caracterization by Adams of his consensus method:

Corollary1.With the relationΞde?ned above,the Adams consensus hierarchy is the only closure system satisfying conditions(AΞ1)and(AΞ2).

It is worth noticing that Adams results point out a case where it actually exists an overhanging order?satisfying conditions(AΞ1)and (AΞ2).Another case appears in[Semple and Steel,2000]in the reseach of a”supertree”.We exhibit other such cases in a work in preparation(for instance whenΞis a relation satisfying conditions(O1)and(O2)).We end by the following result,where the solution to(AΞ1)and(AΞ2)ap-pears,when it exists,to be actually an approximation of the given relationΞ.

Corollary2.LetΞbe a binary relation on P(S)and?an overhanging order satisfying conditions(AΞ1)and(AΞ2).Then,for any overhanging order? ,the inclusionsΞ?? ??imply? =?.

Proof.AssumeΞ?? ??.Equivalently,if F and F are the closure systems associated,respectively,to? and to?,there exists a meet irreducible C of F such that F ?F?{C}.It follows that(C,C+)/∈? , whereas,according to(AΞ2),(C,C+)∈Ξ.This is a contradiction with the hypothesisΞ?? .

In the talk,we present examples where the data consist of a pro?le F?of classi?cation systems.In particular,pro?les of hierarchies or phylogenies are considered.Now the above results prompt us to start from a relationΞobtained as another function of the?i’s than intersection.We are then able to obtain a consensus classi?cation system which preserve more information from the pro?le than the Adams one,but is no longer a hierarchy.

430Leclerc

References

[Adams III,1972]E.N.Adams III.Consensus techniques and the comparison of taxonomic trees.Systematic zoology,pages390–397,1972.

[Adams III,1986]E.N.Adams III.N-trees as nestings:complexity,similarity and consensus.Journal of Classi?cation,pages299–317,1986.

[Armstrong,1974]W.W.Armstrong.Dependency structures of data base https://www.wendangku.net/doc/1112615104.html,rmation Processing,pages580–583,1974.

[Barth′e lemy and Leclerc,1995]J.P.Barth′e lemy and B.Leclerc.The median proce-dure for partitions.In Cox I.J.,P.Hansen,and B.Julesz,editors,Partitioning data sets,pages3–34,1995.

[Barth′e lemy and M.F.,1991]J.P.Barth′e lemy and Janowitz M.F.A formal theory of consensus.SIAM Journal on Discrete Mathematics,pages305–322,1991. [Barth′e lemy et al.,1986]J.P.Barth′e lemy,B.Leclerc,and B.Monjardet.On the use of ordered sets in problems of comparison and consensus of classi?cations.

Journal of Classi?cation,pages187–224,1986.

[Caspard and Monjardet,2003]N.Caspard and B.Monjardet.The lattices of moore families and closure operators on a?nite set:a survey.Discrete Applied Math-ematics,pages241–269,2003.

[Day and McMorris,2003]W.H.E.Day and F.R.McMorris.Axiomatic Consensus Theory in Group Choice and Biomathematics.SIAM,Philadelphia,2003. [Domenach and Leclerc,2004]F.Domenach and B.Leclerc.Closure systems,im-plicational systems,overhanging relations and the case of hierarchical classi?-cation.Mathematical Social Sciences,pages349–366,2004.

[Domenach and Leclerc,2004b]F.Domenach and B.Leclerc.Consensus of classi-?cation systems,with adams’results revisited.In D.Banks,L.House,F.R.

McMorris,P.Arabie,and W.Gaul,editors,Classi?cation,Clustering and Data Mining Applications,pages417–428,2004b.

[Leclerc,1994]B.Leclerc.Medians for weight metrics in the covering graphs of semilattices.Discrete Applied Mathematics,pages281–297,1994. [Leclerc,1998]B.Leclerc.Consensus of classi?cations:the case of trees.In A.Rizzi, M.Vichi,and H.-H.Bock,editors,Advances in Data Science and Classi?cation, pages81–90,1998.

[Leclerc,2003]B.Leclerc.The median procedure in the semilattice of orders.Dis-crete Applied Mathematics,pages285–302,2003.

[Margush and McMorris,1981]T.Margush and F.R.McMorris.Consensus n-trees.

Bulletin of Mathematical Biology,pages239–244,1981.

[Mirkin,1975]B.Mirkin.On the problem of reconciling partitions.In Quantita-tive Sociology,International Perspectives on mathematical and Statistical Mod-elling,pages441–449,1975.

[Monjardet and Raderanirina,2004]B.Monjardet and https://www.wendangku.net/doc/1112615104.html,ttices of choice functions and consensus problems.Social Choice and Welfare,pages 349–382,2004.

[Monjardet,1990]B.Monjardet.Arrowian characterization of latticial federation consensus functions.Mathematical Social Sciences,pages51–71,1990. [Raderanirina,2001]V.Raderanirina.Treillis et agr′e gation de familles de Moore et de fonctions de choix,Ph.D.Thesis.Universit′e Paris1,Paris,2001.

[R′e gnier,1965]S.R′e gnier.Sur quelques aspects math′e matiques des probl`e mes de classi?cation automatique.ICC Bulletin,pages175–191,1965.

Consensus of Classi?cation Systems431 [R′e gnier,1983]S.R′e gnier.Sur quelques aspects math′e matiques des probl`e mes de classi?cation automatique(seconde publication).Math′e matiques et Sciences humaines,pages13–29,1983.

[Semple and Steel,2000]C.Semple and M.A.Steel.A supertree method for rooted trees.Discrete Applied Mathematics,pages147–158,2000.

相关文档