当前位置：文档库 › A Global-Model Naive Bayes Approach to the Hierarchical Prediction of Protein Functions - ICDE 2009

A Global-Model Naive Bayes Approach to the Hierarchical Prediction of Protein Functions - ICDE 2009

A Global-Model Naive Bayes Approach

to the Hierarchical Prediction of Protein Functions Carlos N.Silla Jr.and Alex A.Freitas

School of Computing and Centre for Biomedical Informatics University of Kent,Canterbury,Kent,UK,CT27NF

{cns2,A.A.Freitas}@https://www.wendangku.net/doc/0518145601.html,

Abstract—In this paper we propose a new global–model approach for hierarchical classi?cation,where a single global classi?cation model is built by considering all the classes in the hierarchy–rather than building a number of local classi?cation models as it is more usual in hierarchical classi?cation.The method is an extension of the?at classi?cation algorithm naive Bayes.We present the extension made to the original algorithm as well as its evaluation on eight protein function hierarchical classi?cation datasets.The achieved results are positive and show that the proposed global model is better than using a local model approach.

Keywords-hierarchical classi?cation;bayesian classi?cation; protein function prediction;

I.I NTRODUCTION

Within the different types of machine learning problems, the most common focus is to solve?at classi?cation prob-lems.In the classi?cation task the algorithm is given a set of labeled objects(training set),each of them described by a set of features,and the aim is to predict the label of unknown labeled objects(test set)based on their features.In a?at classi?cation problem,each test example e(unseen during training)will be assigned a class c∈C(where C is the set of classes of the given problem),where C has a“?at”structure(there is no relationship among the classes).This approach is often single label,i.e.the classi?er will only output one possible class for each test example.Apart from ?at classi?cation,there are problems that are hierarchical by its nature,involving a hierarchy of classes to be predicted.

E.g.in text categorization by topic,due to the large number of possible topics,the simple use of a?at classi?er seems infeasible.As the number of topics becomes larger,?at categorizers face the problem of complexity that may incur in rapid increase of time and storage[1].For this reason, using?at classi?cation algorithms might not be?t to the problem.The machine learning method has to be tailored to deal with the hierarchical class structure.

One of the application domains that can truly bene?t from hierarchical classi?cation is the?eld of bioinformatics,more precisely the task of protein function prediction.This task is particularly interesting as,although the human sequencing genome project has ended,the contribution made for knowl-edge is less clear,because we still do not know the functions of many proteins encoded by genes[2].Also,one of the most used methods to infer new protein functions,BLAST, which is based on measuring similarity between protein sequences,has some limitations.In particular,proteins with similar sequences can have very different functions[3]and BLAST does not produce a classi?cation model that can give the user insight about the relationship between protein features and proteins functions[4].

In this work,we extend the traditional?at(ignoring class-relationships)Naive Bayes to deal with a hierarchical clas-si?cation problem.This extension allows the algorithm to create a global–model that allows the prediction of any class in the hierarchical class structure instead of only classes at the leaf nodes of the class hierarchy.The classi?cation model is said to be global because it is built by considering all classes in the hierarchy–rather than building a number of local classi?cation models as usual.We also augment the global-model Naive Bayes by using a notion of“usefulness”by taking into account the depth of the prediction.The motivation is that deeper predictions tend to be more useful (more speci?c and informative)to the user than more general predictions.We evaluate our approach on eight protein datasets and compare it against a suitable baseline approach tailored for hierarchical classi?cation problems.

The remainder of this paper is organized as follows: Section II presents background on hierarchical classi?cation. Section III discusses the new global–model naive Bayes method for hierarchical classi?cation proposed in this paper. Section IV presents the experimental setup and reports the computational results on the task of hierarchical pro-tein function prediction.Conclusions and some perspectives about future work are stated in Section VI.

II.H IERARCHICAL C LASSIFICATION

The existing hierarchical classi?cation methods can be analyzed under different aspects[5],[4],as follows:?The type of hierarchical structure of the classes,which

can be either a tree structure or a DAG(Direct Acyclic Graph)structure.In this work,the datasets are orga-nized into a tree-structured class hierarchy;

?How deep the classi?cation in the hierarchy is per-formed.I.e.,if the output of the classi?er is always

2009 Ninth IEEE International Conference on Data Mining

a leaf node(which[4]refers to as Mandatory Leaf-

Node Prediction and[5]refers to as Virtual Category Tree)or if the most speci?c(“deepest”)class predicted by the classi?er for a given example could be a node at any level of the class hierarchy(which[4]refers to as Non-Mandatory Leaf Node Prediction and[5]refers to as Category Tree).In this work,we are dealing with

a non-mandatory leaf node prediction problem.?How the hierarchical class structure is explored by

the algorithm.The existing hierarchical classi?cation approaches can be classi?ed into Local and Global approaches.In this work we propose a new global approach.

The local–model approach consists of creating a local classi?er for every parent node[6](i.e.,any non-leaf node) in the class hierarchy(assuming a multi-class classi?er is available)or a local binary classi?er for each class node [7](parent or leaf node,except for the root node).In the former case the classi?er’s goal is to discriminate among the child classes of the classi?er’s corresponding node.In the latter case,each binary classi?er predicts whether or not an example belongs to its corresponding class.In both cases, these approaches are creating classi?ers with a local view of the problem.Despite the differences on creating and training the classi?ers,these approaches are often used with the same “top-down”class prediction strategy in the testing phase. The top-down class prediction approach works in the testing phase as follows.For each level of the hierarchy(except the top level),the decision about which class is predicted at the current level is based on the class predicted at the previous(parent)level.The main disadvantage of the local approach with the top-down class prediction approach is that a classi?cation mistake at a high level of the hierarchy is propagated through all the descendant nodes of the wrongly assigned class.

In the global–model approach,a single(relatively com-plex)classi?cation model is built from the training set, taking into account the class hierarchy as a whole during a single run of the classi?cation algorithm.When used during the test phase,each test example is classi?ed by the induced model,a process that can assign classes at potentially every level of the hierarchy to the test example[4].In this work we propose a novel global classi?cation approach to avoid the above–mentioned drawback of the local approach. Most of the classi?cation research in machine learning focus on the development and improvement of?at classi-?cation methods.In addition,in the?eld of hierarchical classi?cation most approaches use a local-model approach with a top-down class prediction approach.The global–model approach is still under-explored in the literature and it deserves more investigation because it builds a singular coherent classi?cation model.Even though a single model produced by the global-model approach will tend to be more complex(larger)than each of the many classi?cation models

root

{{{

{{

R R R R

}}}

}}

C C C

C C2

z z z

z z

D D D

D D

1.11.2

2.12.22.3

Figure1.Hierarchical class structure example

produced by the local–model approach,intuitively the single global model will tend to be much simpler(smaller)than the entire hierarchy of local classi?cation models.There is also empirical evidence for this intuitive reasoning[8],[9].

III.T HE G LOBAL-M ODEL N AIVE B AYES

As seen in the previous section,we are interested in developing a hierarchical classi?cation algorithm that builds a global classi?cation model.Moreover,we also want the algorithm to output its decision process in a human un-derstandable format and to be able to cope with naturally missing attribute values.For these reasons,in this work,we consider the use of Bayesian algorithms instead of“black box”type of algorithms like SVM or neural networks. Bayesian methods range from the simple Naive Bayes (which assumes no dependency between the attributes given the class)to Bayesian networks,which ef?ciently encode the joint probability distribution for a large set of variables [10].

The training of a Bayesian classi?er has two main steps: (1)Deciding the topology of the network representing attribute dependencies;(2)Computing the required proba-bilities.The second step,at least,needs to be adapted to hierarchical classi?cation.Hence,in this paper we discuss how to adapt this second step to the task of hierarchical classi?cation–considering the less investigated scenario of global classi?cation models;whilst adapting the?rst step to hierarchical classi?cation will be investigated in future research.Hence,in this paper we focus on the well-known naive Bayes algorithm,which has the advantages of simplicity and computational ef?ciency(an important point in hierarchical classi?cation,given the large number of classes to be predicted).Considering Figure1,where each node in the tree corresponds to a class,the Naive Bayes classi?er has the following components:

?Topology:The topology is essentially the same for?at and hierarchical classi?cation;the difference is that the “class node”has an internal hierarchical structure.?Prior Probabilities-computed for each class:P(1),P(2), P(1.1),P(1.2),P(2.1),P(2.2),P(2.3).?Likelihoods:P(A i=V ij|1),P(A i=V ij|2),P(A i=V ij|1.1), P(A i=V

|1.2),P(A i=V

|2.1),P(A i=V

|2.2).This is computed for each attribute A i,and each value V ij belonging to the domain of A i,i=1,...,n,j=1,...,i m,

where n is the number of attributes and i m is the number of values in the domain of the i th attribute. After training,during the testing phase,the question that arises is how to assign a class to a test example?The original (?at)Naive Bayes simply assigns the class with maximum value of the posterior probability given by p(Class)=

i=1(A i=V

|Class)×P(Class)[11].However this needs

to be adapted to hierarchical classi?cation,where classes at different levels have different trade-offs of accuracy and usefulness to the user.In order to extend the original naive Bayes classi?er to handle the class hierarchy,the following modi?cations were introduced in the algorithm:

?Modi?cation of the prior calculations:During the train-ing phase,when there is an example that belongs to a certain class(say class2.1),this means that the prior probabilities of both that class and its ancestor classes

(i.e.classes2and2.1in this case)are going to be

updated).(This is because we are dealing with a“IS-A”class hierarchy,as usual.)

?Modi?cation of the likelihood calculation:As in the prior calculations,when a training example is pro-cessed,its attribute-value pair counts are added to the counts of the given class and its ancestor classes.As in the previous example,if the training example belongs to class2.1,the attribute-value counts are added to the counts of both classes2and2.1.

These modi?cations will allow the algorithm to predict classes at any level of the hierarchy.However,although the predictions of deeper classes are often less accurate(since deeper classes have fewer examples to support the training of the classi?er than shallower classes),deeper class predictions tend to be more useful to the user,since they provide more speci?c information than shallower class predictions.If we only consider the posterior class probability(the product of likelihood×prior class probability)we would not take into account the usefulness to the user.It is interesting therefore to select a class label which has a high posterior probability and is also useful to the user.Therefore an optional step in the proposed method is to predict the class with maximum value of the product of posterior probability×usefulness. The question that arises is how to evaluate the usefulness of a predicted class?

Given that predictions at deeper levels of the hierarchy are usually more informative than the classes at shallower levels, some sort of penalization for shallower class predictions is needed.In Clare’s work[12]the original formula for entropy was modi?ed to take into account two aspects:multiple labels and prediction depth(usefulness)to the user.In this work we have modi?ed the part of the entropy-based formula described in[12].The main reason to modify this formula is that while Clare was using a decision tree classi?er based on entropy,in this work we are using a Bayesian algorithm that makes use of probabilities.Therefore,we need to adapt the“usefulness”measure from[12]to the context our algorithm.Also,all that we need is a measure to assign different weights to different classes at different class levels.Therefore,we adapt Clare’s measure of usefulness by using a normalized usefulness value based on the position of each class level in the hierarchy.Moreover,we only use the normalized value of the Clare’s equation to measure the usefulness:

usefulness(c i)=1?(

a(c i)log2treesize(c i)

max

)(1) where:

?treesize(c i)=1+number of descendant classes of c i (1is added to represent c i itself)

?a(c i)=0,if p(c i)=0;a(c i)=a user de?ned constant (default=1)otherwise.

?max is the highest value obtained by computing

a(c i)log2treesize(c i)and it is used to normalize all the other values into the range[0,1].

To make the?nal classi?cation decision,the global-model naive Bayes has two options.The?rst option is to assign the?nal class label with the maximum value of posterior probability(Equation2).The second option is to assign the class label which maximizes the product of the posterior probability and usefulness(Equation3).

classify(A)=arg max

class

i=1

(A i=V

|Class)×P(Class)

(2)

classify(A)=arg max

class

i=1

((A i=V

|Class)×P(Class))

×Usefulness(Class)

(3)

IV.E XPERIMENTAL D ETAILS

A.Establishing a Baseline Method

An important issue when dealing with hierarchical clas-si?cation is how to establish a meaningful baseline method. Since we are dealing with a problem where the classi?er’s most speci?c class prediction for an example can be at any level of the hierarchy(non-mandatory leaf node prediction –see Section2),it is fair to have a comparison against a method whose most speci?c class prediction can also be at any level in the class hierarchy.

Therefore,in this work,as a baseline method,we use the same broad type of classi?er(Naive Bayes),but with a conventional local–model approach with a top-down class prediction testing approach.More precisely,during the train-ing phase,for every non-leaf class node,a naive Bayes multi-class classi?er was trained to distinguish between the node’s

B IOINFORMATICS D ATASETS D ETAILS.

Protein Type Signature Type#of Attributes#of Examples#Classes/Level

Enzyme Interpro1,21614,0276/41/96/187 Pfam70813,9876/41/96/190 Prints38214,0256/45/92/208 Prosite58514,0416/42/89/187

GPCR Interpro4507,44412/54/82/50 Pfam757,05312/52/79/49 Prints2835,4048/46/76/49 Prosite1296,2469/50/79/49

child classes.To implement the test phase,we used the top-down class prediction strategy(see Section2)in the context of a non-mandatory leaf-node class prediction problem.The criterion for deciding at which level to stop the classi?cation during the top-down classi?cation process is based on the usefulness measure(see Section3).

Since we already have the measure for usefulness of a predicted class,we decided to use the following stopping cri-terion:If p(c i)×usefulness(c i)>p(c j)×usefulness(c j) for all classes c j that are a child of the current class c i,then stop classi?cation.In other words,if the posterior probability times the usefulness(given by Equation1)computed by the classi?er at the current class node is higher than the posterior probability times the usefulness computed for each of its child class nodes,then stop the classi?cation at the current class node–i.e.,make that class the most speci?c(deeper) class predicted for the current test example.

B.Bioinformatics Datasets Used in the Experiments

In this work we have used datasets about two different proteins families:Enzymes and GPCRs(G-Protein-Coupled Receptors).Enzymes are catalysts that accelerate chemical reactions while GPCRs are proteins involved in signalling and are particularly important in medical applications as it is believed that from40%to50%of current medical drugs target GPCR activity[13].In each dataset,each example represents a protein.

Each dataset[14]has four different versions based on different kinds of predictor attributes,and in each dataset the classes to be predicted are hierarchical protein functions. Each type of binary predictor attribute indicates whether or not a“protein signature”(or motif)occurs in a protein.The motifs used in this work were:Interpro Entries,FingerPrints from the Prints database,Prosite Patterns and Pfam.Apart from the presence/absence of several motifs according to the signature method,each protein has two additional attributes: the molecular weight and the sequence length.

Before performing the experiments,the following pre-processing steps were applied to the datasets:(1)Every class with fewer than10examples was merged with its parent class.If after this merge the class still had fewer than10 examples,this process would be repeated recursively until the examples would be labeled to the Root class.(2)All examples whose most speci?c class was the Root class were removed.(3)A class blind discretization algorithm based on equal-frequency binning(using20bins)was applied to the molecular weight and sequence length attributes, which were the only two continuous attributes in each dataset.Table I presents the datasets’main characteristics after these pre-processing steps.The last column of Table I presents the number of classes at each level of the hier-archy(1st/2nd/3rd/4th levels).In all datasets,each protein (example)is assigned at most one class at each level of the hierarchy.The pre-processed version of the datasets (as they were used in the experiments)are available at: https://www.wendangku.net/doc/0518145601.html,/people/rpg/cns2/

V.C OMPUTATIONAL R ESULTS

In this section,we are interested in answering the fol-lowing questions by using controlled experiments:(a)How does the choice of a local(with top-down class prediction approach)or global(with the proposed method)approach affect the performance of the algorithms?(b)How does the inclusion of the usefulness criterion(Equation1)affect the global model Naive Bayes algorithm?All the experiments reported in this section were obtained by using the datasets presented in Section IV-B,using strati?ed ten-fold cross-validation.

In order to evaluate the algorithms we have used the metrics of hierarchical precision(hP),hierarchical recall (hR)and hierarchical f-measure(hF)proposed in[15]. These measures are extended versions of the well known metrics of precision,recall and f-measure but tailored to the hierarchical classi?cation scenario.They are de?ned as follows:

hP=

|?P i∩?T i|

|?P i|

,hR=

|?P i∩?T i|

|?T i|

,hF=2?hP?hR

hP+hR

, where?P i is the set consisting of the most speci?c class predicted for test example i and all its ancestor classes and?T i is the set consisting of the true class of test ex-ample i and all its ancestor classes.The main advantage of using this particular metric is that it can be applied to any hierarchical classi?cation scenario(i.e.single label, multi-label,tree-structured,dag-structured,mandatory-leaf node or non-mandatory leaf node problems).In addition, this measure penalizes shallow predictions because such predictions would have relatively low recall values,therefore

H IERARCHICAL P RECISION(H P),R ECALL(H R)AND F1-M EASURE(H F)ON THE HIERARCHICAL PROTEIN FUNCTION DATASETS.

LMNBwU GMNB GMNBwU

Databases hP hR hF hP hR hF hP hR hF

GPCR-Interpro70.4967.2967.9087.6071.3377.0184.3974.7678.27

GPCR-Pfam66.4959.1761.3277.2357.5264.4070.3560.1363.53

GPCR-Prints70.1366.3266.9987.0669.4275.3883.0473.0076.51

GPCR-Prosite63.4555.9558.1175.6453.7361.1466.3856.6159.89

EC-Interpro74.8580.2376.6494.9689.5890.5394.0792.8492.65

EC-Pfam74.9479.7376.4795.1586.9488.7293.6992.2592.13

EC-Prints78.3582.7379.7992.2187.2687.9890.9690.6289.92

EC-Prosite81.7386.5283.2095.1489.5390.7093.3892.4592.01

introducing some pressure for predictions to be as deep as possible(to increase recall)as long as precision is not too compromised.This approach to cope with the trade-off between precision and recall is suitable to our non-mandatory leaf-node prediction problem.

To measure if there is any statistically signi?cant differ-ence between the hierarchical classi?cation methods being compared,we have employed the Friedman test with the post-hoc Shaffer’s static procedure for comparison of multi-ple classi?ers over many datasets as strongly recommended by[16].Table III presents the results of this test using the values of hierarchical F-measure.The?rst column of Table III presents which classi?ers are being compared.The sec-ond column presents the p value of the statistical test,which needs to be lower than the corrected critical value shown on the third column,in order to have a statistically signi?cant difference between the performance of two classi?ers at a con?dence level of95%.

A.Evaluating the Local Vs.Global Model Approaches We?rst evaluate the impact of the usefulness compo-nent in the different types of hierarchical classi?cation algorithms.Table II presents the results comparing the baseline local-model naive Bayes(LMNBwU)described in Section IV-A with the proposed global-model naive Bayes (GMNBwU),both with usefulness.

For all eight datasets the proposed global-model with usefulness obtained signi?cantly better results than the local-model with usefulness.The statistical signi?cance of the detailed results shown in Table II is con?rmed by the?rst row of Table III,where the p value is much smaller than the corrected critical value.The same result is achieved by the global-model naive bayes without usefulness as shown in Table II and con?rmed by the second row of Table III. These results corroborate the ones reported in[9]where a global–model decision-tree approach was also better than a local-model one.Most previous studies comparing the local–model and global–model approaches have focused on mandatory leaf node prediction problems[8][17],which is a simpler scenario–since there is no need to decide at which level the classi?cation should be stopped for each example and there is no need to consider the trade-off between predictive accuracy and usefulness.

B.Evaluating the impact of the usefulness measure in the global-model Naive Bayes

Let us now evaluate the impact of the optional usefulness criterion in the proposed global-model naive Bayes,which considers the trade-off between accuracy and usefulness when deciding what should be the most speci?c class-predicted for a given test example.Table II shows the hierarchical measures of precision,recall and f-measure of the global-model Naive Bayes without(GMNB)and with the usefulness criterion(GMNBwU).

The analysis of the results collaborate with our previous statements.That is,the GMNB has an overall higher hierar-chical precision than the GMNBwU,while the GMNBwU has a higher overall hierarchical recall than the GMNB.This means that that by adding the usefulness to the global-model naive bayes,the classi?er is really making deeper predictions at the cost of their precision.It should be noted however, that there is no statistically signi?cant difference between the hF measure values of the two classi?ers,as show in the third row of Table III.

The decision of which version of the classi?er to use will depend on the type of protein being studied and the costs associated with the biological(laboratory)experiments in order to verify if the predictions are correct.

Table III

R ESULTS OF S TATISTICAL T ESTS FORα=0.05

algorithm p Shaffer

LMNBwU vs.GMNBwU 4.6525E-40.01666

LMNBwU vs.GMNB0.01240.05

GMNB vs.GMNBwU0.31730.05

VI.C ONCLUSIONS AND F UTURE W ORK

In this paper we have proposed a novel algorithm that is an extension of the Naive Bayes algorithm to handle hierarchical classi?cation problems by producing a single global classi?cation model–rather than a number of local classi?cation models as in the conventional local classi-?er approach with the top-down class prediction approach. Moreover,contrary to the usual scenario of hierarchical

classi?cation problems where the algorithm has to predict one of the leaf classes for each test example,in this work we dealt with the less-conventional scenario where the algorithm can predict,as the most speci?c class for a test example,a class at any level of the hierarchy(also known as a non-mandatory leaf-node prediction problem).In this scenario, we have chosen to combine the posterior probability of each class with the notion of prediction usefulness based on class depth,since deeper classes tend to be more useful(more informative)to the user than shallower classes.

In order to perform the experiments,we employed suitable hierarchical classi?cation measures to this non-mandatory leaf-node prediction scenario and also established a mean-ingful baseline hierarchical classi?cation method by mod-ifying a local classi?er approach with the top-down class prediction approach to take into account the same usefulness measure used by the proposed global-model algorithm. The proposed global-model and the baseline local-model algorithms were evaluated on eight proteins datasets. The two versions of the proposed global-model algorithm achieved signi?cantly better hierarchical classi?cation accu-racy(measured by hierarchical f-measure)than the local-model approach.We also presented results showing that the notion of usefulness allows the global-model algorithm to obtain a hierarchical f-measure similar to the one obtained without the use of usefulness but making more speci?c predictions,which tend to be more useful to the user.

As future research,we intend to evaluate this method on a larger number of datasets and compare it against other global hierarchical classi?cation approaches,like the ones proposed in[9],[18].

A CKNOWLEDGMENT

We want to thank Dr.Nick Holden for kindly providing us with the datasets used in this experiments.The?rst author is?nancially supported by CAPES–a Brazilian research-support agency(process number4871-06-5).

R EFERENCES

[1] D.Tikk,G.Bir′o,and J. D.Yang,“A hierarchical text

categorization approach and its application to frt expansion,”

Australian Journal of Intelligent Information Processing Sys-tems,vol.8,no.3,pp.123–131,2004.

[2] D.W.Corne and G.B.Fogel,Evolutionary Computation in

Bioinformatics.Morgan Kaufmann,2002,ch.An Introduc-tion to Bioinformatics for Computer Scientists,pp.3–18. [3]J. A.Gerlt and P. C.Babbitt,“Can sequence determine

function?”Genome Biology,vol.1,no.5,2000.

[4] A.A.Freitas and A.C.P.L.F.de Carvalho,Research and

Trends in Data Mining Technologies and Applications.Idea Group,2007,ch.A Tutorial on Hierarchical Classi?cation with Applications in Bioinformatics,pp.175–208.

[5] A.Sun and E.-P.Lim,“Hierarchical text classi?cation and

evaluation,”in Proc.of the IEEE Int.Conf.on Data Mining, 2001,pp.521–528.

[6] D.Koller and M.Sahami,“Hierarchically classifying docu-

ments using very few words,”in Proc.of the14th Int.Conf.

on Machine Learning,1997,pp.170–178.

[7]S.D′Alessio,K.Murray,R.Schiaf?no,and A.Kershenbaum,

“The effect of using hierarchical classi?ers in text categoriza-tion,”in Proc.of the6th Int.Conf.Recherche d′Information Assistee par Ordinateur,2000,pp.302–313.

[8] E.Costa, A.Lorena, A.Carvalho, A. A.Freitas,and

N.Holden.,“Comparing several approaches for hierarchical classi?cation of proteins with decision trees,”in Advances in Bioinformatics and Computational Biology,ser.Lecture Notes in Bioinformatics,vol.4643.Springer,2007,pp.126–137.

[9] C.Vens,J.Struyf,L.Schietgat,S.so D?z eroski,and H.Block-

eel,“Decision trees for hierarchical multi-label classi?cation,”

Machine Learning,vol.73,no.2,pp.185–214,2008. [10] D.Heckerman,“A tutorial on learning with bayesian net-

works,”Microsoft,Technical Report MSR-TR-95-06,1995.

[11]T.M.Mitchell,Machine Learning.McGraw-Hill,1997.

[12] A.Clare,“Machine learning and data mining for yeast

functional genomics,”Ph.D.dissertation,University of Wales Aberystwyth,2004.

[13] D.Filmore,“It’s a GPCR world,”Modern drug discovery,

vol.7,no.11,pp.24–28,2004.

[14]N.Holden and A.A.Freitas,“Hierarchical classi?cation of

protein function with ensembles of rules and particle swarm optimisation,”Soft Computing Journal,vol.13,pp.259–272, 2009.

[15]S.Kiritchenko,S.Matwin,and A.F.Famili,“Functional

annotation of genes using hierarchical text categorization,”in Proc.of the ACL Workshop on Linking Biological Literature, Ontologies and Databases:Mining Biological Semantics, 2005.

[16]S.Garc′?a and F.Herrera,“An extension on“statistical com-

parisons of classi?ers over multiple data sets”for all pairwise comparisons,”Journal of Machine Learning Research,vol.9, pp.2677–2694,2008.

[17]M.Ceci and D.Malerba,“Classifying web documents in a

hierarchy of categories:A comprehensive study,”Journal of Intelligent Information Systems,vol.28,no.1,pp.1–41,2007.

[18]J.Rousu,C.Saunders,S.Szedmak,and J.Shawe-Taylor,

“Kernel-based learning of hierarchical multilabel classi?ca-tion models,”Journal of Machine Learning Research,vol.7, pp.1601–1626,2006.

递推最小二乘法算法

题目：（递推最小二乘法）考虑如下系统： )()4(5.0)3()2(7.0)1(5.1)(k k u k u k y k y k y ξ+-+-=-+-- 式中，)(k ξ为方差为0.1的白噪声。取初值I P 610)0(=、00=∧ ）（θ。选择方差为1的白噪声作为输入信号)(k u ，采用PLS 法进行参数估计。 Matlab 代码如下： clear all close all L=400; %仿真长度 uk=zeros(4,1); %输入初值：uk(i)表示u(k-i) yk=zeros(2,1); %输出初值 u=randn(L,1); %输入采用白噪声序列 xi=sqrt(0.1)*randn(L,1); %方差为0.1的白噪声序列 theta=[-1.5;0.7;1.0;0.5]; %对象参数真值 thetae_1=zeros(4,1); %（）θ初值 P=10^6*eye(4); %题目要求的初值 for k=1:L phi=[-yk;uk(3:4)]; %400×4矩阵phi 第k 行对应的y(k-1),y(k-2),u(k-3), u(k-4) y(k)=phi'*theta+xi(k); %采集输出数据 %递推最小二乘法的递推公式 K=P*phi/(1+phi'*P*phi); thetae(:,k)=thetae_1+K*(y(k)-phi'*thetae_1); P=(eye(4)-K*phi')*P; %更新数据 thetae_1=thetae(:,k); for i=4:-1:2 uk(i)=uk(i-1); end uk(1)=u(k); for i=2:-1:2 yk(i)=yk(i-1);

几种最小二乘法递推算法的小结

一、递推最小二乘法递推最小二乘法的一般步骤： 1. 根据输入输出序列列出最小二乘法估计的观测矩阵?： ] )(u ... )1( )( ... )1([)(T b q n k k u n k y k y k ------=? 没有给出输出序列的还要先算出输出序列。本例中， 2)]-u(k 1),-u(k 2),-1),-y(k -[-y(k )(T =k ?。 2. 给辨识参数θ和协方差阵P 赋初值。一般取0θ=0或者极小的数，取σσ,20I P =特别大，本例中取σ=100。 3. 按照下式计算增益矩阵G ： ) ()1()(1)()1()(k k P k k k P k G T ???-+-= 4. 按照下式计算要辨识的参数θ： )]1(?)()()[()1(?)(?--+-=k k k y k G k k T θ?θθ 5. 按照下式计算新的协方差阵P ： )1()()()1()(---=k P k k G k P k P T ? 6. 计算辨识参数的相对变化量，看是否满足停机准则。如满足，则不再递推；如不满足，则从第三步开始进行下一次地推，直至满足要求为止。停机准则：ε???<--) (?)1(?)(?max k k k i i i i 本例中由于递推次数只有三十次，故不需要停机准则。 7. 分离参数：将a 1….a na b 1….b nb 从辨识参数θ中分离出来。 8. 画出被辨识参数θ的各次递推估计值图形。为了说明噪声对递推最小二乘法结果的影响，程序5-7-2在计算模拟观测值时不加噪声，辨识结果为a1 =1.6417，a2 = 0.7148，b1 = 0.3900，b2 =0.3499，与真实值a1 =1.642， a2 = 0.715， b1 = 0.3900，b2 =0.35相差无几。程序5-7-2-1在计算模拟观测值时加入了均值为0，方差为0.1的白噪声序列，由于噪声的影响，此时的结果为变值，但变化范围较小，现任取一组结果作为辨识结果。辨识结果为a1 =1.5371， a2 = 0.6874， b1 = 0.3756，b2 =0.3378。程序5-7-2-2在计算模拟观测值时加入了有色噪声，有色噪声为 E(k)+1.642E(k-1)+0.715E(k-2)，E(k)是均值为0，方差为0.1的白噪声序列，由于有色噪声的影响，此时的辨识结果变动范围远比白噪声时大，任取一组结果作为辨识结果。辨识结果为a1 =1.6676， a2 = 0.7479， b1 = 0.4254，b2 =0.3965。可以看出，基本的最小二乘法不适用于有色噪声的场合。

递推算法

递推算法典型例题一、教学目标 1、由浅入深，了解递推算法 2、掌握递推算法的经典例题二、重点难点分析 1、重点：递推关系的建立 2、难点：如何将所求问题转化为数学模型三、教具或课件微机四、主要教学过程 (一)引入新课客观世界中的各个事物之间或者一个事物的内部各元素之间，往往存在(隐藏)着很多本质上的关联。我们设计程序前．应该要通过细心的观察、丰富的联想、不断的尝试推理．尽可能先归纳总结出其内在规律，然后再把这种规律性的东西抽象成数学模型，最后再去编程实现。递推关系和递归关系都是一种简洁高效的常见数学模型，我们今天先来深入研究一下递推算法如何实现。 (二)教学过程设计递推法是一种重要的数学方法，在数学的各个领域中都有广泛的运用，也是计算机用于数值计算的一个重要算法。这种算法特点是：一个问题的求解需一系列的计算，在已知条件和所求问题之间总存在着某种相互联系的关系，在计算时，如果可以找到前后过程之间的数量关系（即递推式），那么，这样的问题可以采用递推法来解决。从已知条件出发，逐步推出要解决的问题，叫顺推；从问题出发逐步推到已知条件，此种方法叫逆推。无论顺推还是逆推，其关键是要找到递推式。这种处理问题的方法能使复杂运算化为若干步重复的简单运算，充分发挥出计算机擅长于重复处理的特点。递推算法的首要问题是得到相邻的数据项间的关系（即递推关系）。递推算法避开了通项公式的麻烦，把一个复杂的问题的求解，分解成了连续的若干步简单运算。一般说来可以将递推算法看成是一种特殊的迭代算法。（在解题时往往还把递推问题表现为迭代形式，用循环处理。所谓“迭代”，就是在程序中用同一个变量来存放每一次推算出来的值，每一次循环都执行同一个语句，给同一变量赋以新的值，即用一个新值代替旧值，

AIC法定阶的依阶次递推算法程序.

AIC 法定阶的依阶次递推算法程序依阶次递推算法所得到估计θ ?，再按下式计算残差方差的估计值： ∑==N j j e N 122 )?,(1?θσ 由上式的结果计算AIC ： )(2?lg AIC 2b a n n N ++=σ 在结果中找到AIC 最小的模型（阶次和参数）就是估计的模型。由输出数据可知当k1=5时aic 的值最小。所以最后的辨识结果取阶次为5，参数为： –1.18394，0.813938，–0.518174，0.348744，–0.116818， 1.07998，–0.74386，0.475444，–0.253022，0.122781 判断的阶次的最小aic 值： aic= – 8981.58发生在阶次为5时。源程序： #include #include #include #include //矩阵求逆函数 int brinv(double f[],int n) { int *is,*js,i,j,k,l,u,v; double d,p; is=(int *)malloc(n*sizeof(int)); js=(int *)malloc(n*sizeof(int)); for (k=0; k<=n-1; k++) { d=0.0; for (i=k; i<=n-1; i++) for (j=k; j<=n-1; j++) { l=i*n+j; p=fabs(f[l]); if (p>d) { d=p; is[k]=i; js[k]=j;} } if (d+1.0==1.0) { free(is); free(js); cout<<"err**not inv\n"; return(0); } if (is[k]!=k) for (j=0; j<=n-1; j++) { u=k*n+j; v=is[k]*n+j; p=f[u]; f[u]=f[v]; f[v]=p; }

实验一用递推公式计算定积分

实验一用递推公式计算定积分 09信息符文飞 07 1、实验目的：由于一个算法是否稳定，十分重要。如果算法不稳定，则数值计算的结果就会严重背离数学模型的真实结果，因此，在选择数值计算公式来进行近似计算时，我们应特别注意选用那些在数值计算过程中不会导致误差迅速增长的公式。体会稳定性在选择算法中的地位．误差扩张的算法是不稳定的，是我们所不期望的；误差衰竭的算法是稳定的．是我们努力寻求的，这是贯穿本课程的目标．通过上机计算，了解舍入误差所引起的数值不稳定性。 2、实验题目：对n =0,1,2,…,20,计算定积分dx x x y n n ?+=10 5 3、实验原理由于y(n)= = – 在计算时有两种迭代方法，如下：方法一： y(n)= – 5*y(n-1),n=1,2,3, (20) 取y(0)= = ln6-ln5 ≈ 0.182322 方法二：

利用递推公式：y(n-1)=-*y(n),n=20,19, (1) 而且，由 = * ≤≤* = 可取：y(20)≈*()≈0.008730. 4、实验内容：算法1的程序： y0=log(6.0)-log(5.0); y1=0; n=1; while n<=30 y1=1/n-5*y0; fprintf('y[%d]=%-20f',n,y1); y0=y1; n=n+1; if mod(n,1)==0; fprintf('\n') end end 算法2的程序： y0=(1/105+1/126)/2;

y1=0; n=1; while n<=30 y1=1/(5*n)-y0/5; fprintf('y[%d]=%-20f',n,y1) y0=y1; n=n+1; if mod(n,1)==0 fprintf('\n') end end 5、实验结果对于算法1： y[1]=0.088392 y[2]=0.058039 y[3]=0.043139 y[4]=0.034306 y[5]=0.028468 y[6]=0.024325 y[7]=0.021233 y[8]=0.018837 y[9]=0.016926

递推算法

递推算法一、学习要点：（经典的五种递推关系） 1.fibonacci（斐波那契）数列 2.hanoi（汉诺）塔 3.平面分割 4.catalan数（卡特兰数） 5.第二类string数二、重点提示 1.产生catalan数的参考程序： Const max=21; Var c:array[2..max] of longint; n,i,k:integer; total:longint; begin write(‘input n=’);readln(n); c[2]:=1; for i:=3 to n do begin c[i]:=0; for k:=2 to i-1 do c[i]:=c[i]+c[k]*c[i-k+1]; end; writeln(‘catalan=’,c[n]); end. Catalan数，从第二项开始为：1,1,2,5,14,42,132,429,1430,4862,16796,58786,208012, 742900,2674440,9694545,35357670,129644790,477638700,1767263190 2.产生第二类的string数的参考程序： Var n,k:integer; Function s(n,k:integer):longint; Begin If (n

实用算法(基础算法-递推法)

实用算法(基础算法-递推法-01) 有一类试题，每相邻两项数之间的变化有一定的规律性，我们可将这种规律归纳成如下简捷的递推关系式： F n=g(F n-1) 这就在数的序列中，建立起后项和前项之间的关系，然后从初始条件(或最终结果)入手，一步步地按递推关系递推，直至求出最终结果(或初始值)。很多程序就是按这样的方法逐步求解的。如果对一个试题，我们要是能找到后一项与前一项的关系并清楚其起始条件(最终结果)，问题就好解决，让计算机一步步算就是了，让高速的计算机做这种重复运算，可真正起到“物尽其用”的效果。递推分倒推法和顺推法两种形式。一般分析思路： if求解条件F1 then begin{倒推} 由题意(或递推关系)确定最终结果Fa；求出倒推关系式F i-1=g'(F i); i=n;{从最终结果Fn出发进行倒推} while 当前结果F i非初始值F1 do由F i-1=g(F1)倒推前项；输出倒推结果F1和倒推过程； end {then} else begin{顺推} 由题意(或顺推关系)确定初始值F1(边界条件)；求出顺推关系式F1=g(Fi-1); i=1;{由边界条件F1出发进行顺推} while 当前结果Fi非最终结果Fn do由Fi=g(Fi-1)顺推后项；输出顺推结果Fn和顺推过程； end; {else} 一、倒推法所谓倒推法，就是在不知初始值的情况下，经某种递推关系而获知问题的解或目标，再倒推过来，推知它的初始条件。因为这类问题的运算过程是一一映射的，故可分析得其递推公式。然后再从这个解或目标出发，采用倒推手段，一步步地倒推到这个问题的初始陈述。下面举例说明。 [例1] 贮油点一辆重型卡车欲穿过1000公里的沙漠，卡车耗油为1升/公里，卡车总载油能力为500公升。显然卡车一次是过不了沙漠的。因此司机必须设法在沿途建立几个储油点，使卡车能顺利穿越沙漠，试问司机如何建立这些储油点？每一储油点应存多

递推算法解析集锦

递推算法集锦一、编写程序求50以内的勾股弦数。即满足c*c=b*b+a*a的三个自然数，要求 b>a。将所有符合要求的a,b,c组合输出至屏幕。解答：采用穷举算法，在主函数中实现。 #include using namespace std; int main(){ int a,b,c,count=0; cout<<"勾股弦数有："<

几种最小二乘法递推算法的小结

递推最小二乘法的一般步骤： 1. 根据输入输出序列列出最小二乘法估计的观测矩阵?： ] )(u ... )1( )( ... )1([)(T b q n k k u n k y k y k ------=? 没有给出输出序列的还要先算出输出序列。本例中， 2)]-u(k 1),-u(k 2),-1),-y(k -[-y(k )(T =k ?。 2. 给辨识参数θ和协方差阵P 赋初值。一般取0θ=0或者极小的数，取σσ,20I P =特别大，本例中取σ=100。 3. 按照下式计算增益矩阵G ： ) ()1()(1)()1()(k k P k k k P k G T ???-+-= 4. 按照下式计算要辨识的参数θ： )]1(?)()()[()1(?)(?--+-=k k k y k G k k T θ?θθ 5. 按照下式计算新的协方差阵P ： )1()()()1()(---=k P k k G k P k P T ? 6. 计算辨识参数的相对变化量，看是否满足停机准则。如满足，则不再递推；如不满足，则从第三步开始进行下一次地推，直至满足要求为止。停机准则：ε???<--) (?)1(?)(?max k k k i i i i 本例中由于递推次数只有三十次，故不需要停机准则。 7. 分离参数：将a 1….a na b 1….b nb 从辨识参数θ中分离出来。 8. 画出被辨识参数θ的各次递推估计值图形。为了说明噪声对递推最小二乘法结果的影响，程序5-7-2在计算模拟观测值时不加噪声，辨识结果为a1 =1.6417，a2 = 0.7148，b1 = 0.3900，b2 =0.3499，与真实值a1 =1.642， a2 = 0.715， b1 = 0.3900，b2 =0.35相差无几。程序5-7-2-1在计算模拟观测值时加入了均值为0，方差为0.1的白噪声序列，由于噪声的影响，此时的结果为变值，但变化范围较小，现任取一组结果作为辨识结果。辨识结果为a1 =1.5371， a2 = 0.6874， b1 = 0.3756，b2 =0.3378。程序5-7-2-2在计算模拟观测值时加入了有色噪声，有色噪声为 E(k)+1.642E(k-1)+0.715E(k-2)，E(k)是均值为0，方差为0.1的白噪声序列，由于有色噪声的影响，此时的辨识结果变动范围远比白噪声时大，任取一组结果作为辨识结果。辨识结果为a1 =1.6676， a2 = 0.7479， b1 = 0.4254，b2 =0.3965。可以看出，基本的最小二乘法不适用于有色噪声的场合。

根据递推公式求数列通项公式的常用方法总结归纳(新)

求递推数列通项公式的常用方法归纳目录一、概述·································· 二、等差数列通项公式和前n项和公式·································· 1、等差数列通项公式的推导过程································ 2、等差数列前n项和公式的推导过程·································· 三、一般的递推数列通项公式的常用方法·································· 1、公式法·································· 2、归纳猜想法·································· 3、累加法·································· 4、累乘法·································· 5、构造新函数法(待定系数法）·································· 6、倒数变换法·································· 7、特征根法·································· 8、不动点法································· 9、换元法································· 10、取对数法·································· 11、周期法··································