文档库 最新最全的文档下载
当前位置:文档库 › 05 A Bipartite Network-based Method for Prediction of Long Non-coding RNA–protein Interactions

05 A Bipartite Network-based Method for Prediction of Long Non-coding RNA–protein Interactions

ORIGINAL RESEARCH

A Bipartite Network-based Method for Prediction of Long Non-coding RNA–protein

Interactions

Mengqu Ge 1,a ,Ao Li 1,2,*,b ,Minghui Wang 1,2,c

1School of Information Science and Technology,University of Science and Technology of China,Hefei 230027,China 2

Centers for Biomedical Engineering,University of Science and Technology of China,Hefei 230027,China

Received 12November 2015;revised 4January 2016;accepted 6January 2016Available online 22February 2016Handled by Zhihua Zhang

KEYWORDS lncRNA;Protein;Interaction;

Bipartite network;Propagation

Abstract As one large class of non-coding RNAs (ncRNAs),long ncRNAs (lncRNAs )have gained considerable attention in recent years.Mutations and dysfunction of lncRNAs have been implicated in human disorders.Many lncRNAs exert their effects through interactions with the corresponding RNA-binding proteins .Several computational approaches have been developed,but only few are able to perform the prediction of these interactions from a network-based point of view.Here,we introduce a computational method named lncRNA–protein bipartite network inference (LPBNI).LPBNI aims to identify potential lncRNA–interacting proteins ,by making full use of the known lncRNA–protein interactions .Leave-one-out cross validation (LOOCV)test shows that LPBNI signi?cantly outperforms other network-based methods,including random walk (RWR)and protein-based collaborative ?ltering (ProCF).Furthermore,a case study was performed to demonstrate the performance of LPBNI using real data in predicting potential lncRNA–interacting proteins.

Introduction

An increasing number of studies show that approximately 2%of the whole mammalian genome represents protein-coding genes,whereas the majority of the genome consists

of non-coding RNA (ncRNA)genes.ncRNAs had long been regarded as transcriptional noise,but recent investigations demonstrate that ncRNAs play an important role in the reg-ulation of diverse biological processes [1–5].Long ncRNAs (lncRNAs),which consist of more than 200nucleotides,con-stitute a large class of ncRNAs [6–7].In the past several years,the number of identi?ed lncRNAs has been increasing sharply because of the development of both bioinformatics tools and experimental techniques.Functional studies of lncRNAs show that mutated and dysfunctional lncRNAs are implicated in a range of cellular processes [8–12]and human diseases,ranging from neurodegeneration to cancer [13–18].Although some lncRNAs, e.g.,Xist [19]and

*Corresponding author.

E-mail:aoli@https://www.wendangku.net/doc/e57949445.html, (Li A).a

ORCID:0000-0001-5185-7045.b

ORCID:0000-0001-9910-8967.c

ORCID:0000-0002-5788-894X.

Peer review under responsibility of Beijing Institute of Genomics,Chinese Academy of Sciences and Genetics Society of China.

Genomics Proteomics Bioinformatics 14(2016)62–71

https://www.wendangku.net/doc/e57949445.html,/10.1016/j.gpb.2016.01.004

1672-0229ó2016The Authors.Production and hosting by Elsevier B.V.on behalf of Beijing Institute of Genomics,Chinese Academy of Sciences and Genetics Society of China.

This is an open access article under the CC BY license (https://www.wendangku.net/doc/e57949445.html,/licenses/by/4.0/).

MALAT1[20],have been well studied,the functions of most lncRNAs remain https://www.wendangku.net/doc/e57949445.html,ually lncRNAs function through interacting with RNA-binding proteins(RBPs)[21–24]. Therefore,it is important to predict the potential lncRNA–protein interactions,in order to study the complex function of lncRNAs.

Since the experimental identi?cation of lncRNA–protein interactions remains costly,developing effective predictive approaches becomes essential.Recently,several computational methods have been reported for predicting potential lncRNA–protein interactions.For instance,Bellucci et al.developed catRAPID in2011[25]by taking into account secondary struc-ture,hydrogen bonds,and van der Waals forces between lncRNAs and proteins.Next,Muppirala et al.[26]introduced a method named RPISeq,using only sequence information of lncRNAs and proteins.Support vector machine(SVM)classi-?ers[27]and random forest(RF)[28]are used to predict RBPs.In2013,Lu et al.[29]developed a novel approach, named lncPro,which uses secondary structure,hydrogen bond,van der Waals force features,and yields the prediction score using Fisher’s linear discriminate https://www.wendangku.net/doc/e57949445.html,ter on,an approach named RPI-Pred was developed by Suresh et al.

[30],they trained SVM-based approach,by extracting sequence and high-order3D structure features of lncRNAs and proteins.

All the aforementioned methods are based on the biolog-ical characteristics of ncRNAs and proteins.CatRAPID and lncPro combined sequence and structural features of lncRNAs and proteins.RPISeq was based on sequence fea-tures.RPI-Pred used the high-order structure features of lncRNAs and proteins.However some studies show that lncRNAs generally exhibit low sequence conservation[1], which may make it dif?cult to predict interactions based on the intrinsic properties of lncRNAs.Biological network-based methods are widely used in many types of studies,such as disease gene prioritization[31]and drug-target interaction prediction[32].The development of bioinformatics technolo-gies such as CLIP-seq and cross-linking immunoprecipitation, has enabled us to construct lncRNA–protein interaction net-works.We introduce here a novel computational method, lncRNA–protein bipartite network inference(LPBNI),for the prediction of lncRNA–protein interactions.LPBNI iden-ti?es novel lncRNA–protein pairs by ef?ciently using the lncRNA–protein bipartite network.In order to evaluate the performance of the proposed method,we compared LPBNI with other network-based methods,including random walk (RWR)[31]and protein-based collaborative?ltering(ProCF) [33].RWR[31]has been used to predict genes associated with potential diseases.ProCF is derived from the recommenda-tion algorithms,similar to the item-based collaborative?lter-ing method[33].The performance evaluation is based on leave-one-out cross validation(LOOCV)of the known lncRNA–protein interactions extracted from NPInter[34]. To further demonstrate the effectiveness of lncRNA–protein bipartite network,six lncRNAs were used to evaluate the per-formance of LPBNI in comparison with the existing methods, lncPro[29]and RPISeq[26].These evaluation tests demon-strated that LPBNI outperforms the other methods signi?-cantly.In a case study,several potential interactions between lncRNAs and proteins identi?ed by LPBNI were well supported by starBase[35],indicating the superior predictive ability of LPBNI.Results

Performance comparison with other network-based methods on lncRNA–protein interactions prediction

We compared the performance of LPBNI with RWR[31]and ProCF[33].ProCF is based on the idea that if a protein inter-acts with an lncRNA,similar proteins will be recommended as interacting with this lncRNA.The linkage between p i and l j can be de?ned as:score ij?

P m

k?1;k–i

S Pep i;p kTa kj

P m

kà1;k–i

S Pep i;p kT

,where S Pep i;p kTis the similarity between proteins p i and p k.Here,we used cosine vector similarity to measure the similarity of proteins: S Pep i;p kT?j deiT\dekTj

??????????????

j deiTjj dekTj

p,where d(i)and d(k)are the degrees of proteins i and k,respectively.

We extracted4870lncRNA–protein interactions from NPInter2.0[34](see‘‘Data collection and preprocessing”for detail).In LPBNI,for one node,at least two interactions are required to perform LOOCV.Therefore,the nodes that have only one link are not considered in the performance evalua-tion,so we further get4796lncRNA–protein interactions which match that condition,and this dataset is taken as‘gold standard’data in the LOOCV test.The receiver operating characteristic(ROC)curves and the area under the curve (AUC)obtained using these methods are shown in Figure1. It is obvious that LPBNI shows the highest true positive rate (TPR)at each false positive rate(FPR).In addition,the AUC value of LPBNI is0.878(Table1),which is higher than that obtained using RWR(0.765)and ProCF(0.738),respec-tively.These data suggest that LPBNI has a better predictive ability compared with RWR and ProCF.To validate the reli-ability of LPBNI,we compared the sensitivity,accuracy,pre-cision,and Matthew’s correlation coef?cient(MCC)of LPBNI,RWR,and ProCF with speci?cities of99.0%and 95.0%,respectively.As shown in Table1,with speci?city of 99.0%,sensitivity,accuracy,precision,and MCC of LPBNI are all higher than that with RWR and ProCF.When speci-?city was reduced to95.0%,sensitivity and MCC increased for all three methods,with decreased precision,although the accuracy remained comparable.However,LPBNI still showed a higher performance in terms of sensitivity,accuracy,preci-sion,and MCC,compared to RWR and ProCF.

The fold enrichment is also used to evaluate the performance of the proposed method,which can be de?ned as:N/2/n[37], where N represents the number of candidate proteins,and n is the ranking of the tested protein among the candidate proteins for the evaluation.Based on the formula,the average fold enrichments are4.007,3.590,and1.653for LPBNI,RWR, and ProCF,respectively.These data suggest that LPBNI outperforms the other methods in identifying lncRNA–related proteins with a higher rank.Table2shows the number of lncRNA–protein interactions that were correctly retrieved at 5%,10%,15%,20%,and50%of all the prediction results, respectively.Among4796true interactions between lncRNAs and proteins,LPBNI achieves a higher retrieval compared with RWR and ProCF,at each of the investigated percentiles.The biggest difference was observed for5%,where LPBNI recovered 579interactions successfully,and only410and116interactions were retrieved using RWR and ProCF,respectively.

Ge M et al/lncRNA–protein Interaction Prediction63

Furthermore,10-fold cross validation was applied,in order to conduct a comprehensive performance evaluation of LPBNI in predicting lncRNA–protein interactions.All lncRNA–pro-tein interactions (4796)are randomly divided into 10equal por-tions.Each portion in turn was left out as a test sample,while the remaining ones were treated as training sets.During cross vali-dation,some nodes are separated into the test sample and the corresponding links cannot be predicted by LPBIN;therefore,those links were not considered in the process of evaluation.The ROC curves for LPBNI,RWR,and ProCF are shown in

Figure S1,supporting the superior performance of LPBNI as well.Taken together,these analyses demonstrate the power of LPBNI in the prediction of lncRNA–protein https://www.wendangku.net/doc/e57949445.html,parison with the existing methods in predicting lncRNA–protein interactions

In order to further evaluate the performance of the proposed method in predicting lncRNA–protein interactions,we compared LPBNI with lncPro [29]and RPISeq [26].RF or

Table 1

Performance comparison of different methods with speci?cities of 99.0%and 95.0%in predicting lncRNA–protein interactions

Speci?city Methods Sensitivity Accuracy Precision MCC 99.0%

LPBNI 0.2880.8730.8520.449RWR 0.0620.8350.5560.282ProCF 0.1180.8440.7030.33495.0%

LPBNI 0.5320.8800.6810.534RWR 0.1560.8170.3840.480ProCF

0.317

0.844

0.560

0.528

Note:LPBNI,lncRNA–protein bipartite network inference;RWR,random walk;ProCF,protein-based collaborative ?ltering;MCC,Matthew’s correlation coef?cient.

comparison of different methods using ROC curves in predicting lncRNA–protein interactions

ROC for the whole dataset using LPBNI (blue,AUC:0.878),RWR (green,AUC:0.765),LOOCV is implemented and 4796known lncRNA–protein interactions are used as gold characteristic;LPBNI,lncRNA–protein bipartite network inference;RWR,random walk;AUC,area under this curve.LOOCV,leave-one-out cross validation.

Table 2

Number of interactions that are correctly recovered from 69true interactions using different methods

Method No.of interactions recovered at each correct recovery percentile

5%10%15%20%50%LPBNI 5791031154325664269RWR 410943140921804092ProCF

116

326

620

1180

2826

Note :Comparison was performed at different correct recovery percentiles including 5%,10%,15%,20%,and 50%.

Performance comparison in terms of ROC curves for six lncRNAs tested

plot is the ROC curves for6lncRNAs(A–F)using LPBNI(blue),lncPro(cyan),RPISeq-RF(yellow),and respectively.For each lncRNA,LOOCV is performed and the corresponding lncRNA–protein interactions

The lncRNA IDs have been abbreviated without the‘‘NONHSAT”pre?x,e.g.,‘‘009703”represents

‘‘NONHSAT009703”.ROC,receiver operating characteristic.

Table3AUC comparison of different methods for six lncRNAs

lncRNA ID LPBNI lncPro RPISeq-RF RPISeq-SVM

NONHSAT0097030.9440.3440.6630.638 NONHSAT0235830.9320.3680.5940.188 NONHSAT0270700.9750.5940.5060.606 NONHSAT0909010.9100.6540.5340.233 NONHSAT1217120.8870.6990.6770.301 NONHSAT1381420.9440.6810.5810.644

Note:AUC,area under the curve.

interactions that are correctly recovered from4796true interactions in different percentiles lncRNA–protein interactions are taken as gold standard dataset.10%,15%,20%,and50%of all percentiles.The number of lncRNA–protein interactions that are correctly retrieved at different

lncRNA–related proteins,the higher the number in each percentile,the better the performance Table4Top5ranked candidate proteins for four selected lncRNAs

lncRNA ID(NONCODE4.0ID)

Top5ranked proteins

LPBNI score Validated STRING ID Name

NONHSAT037119(RP11-349A22.5)9606.ENSP00000254108RNA-binding protein FUS0.528starBase

9606.ENSP00000220592Signal recognition particle54kDa protein0.278

9606.ENSP00000240185TAR DNA-binding protein430.275

9606.ENSP00000401371Nucleolysin TIA-1isoform p400.247starBase

9606.ENSP00000349428Polypyrimidine tract-binding protein10.192starBase NONHSAT010657(HNRNPU-AS1)9606.ENSP00000290341Insulin-like growth factor2mRNA-binding

protein1

0.801starBase

9606.ENSP00000240185TAR DNA-binding protein430.483

9606.ENSP00000258962Serine/arginine-rich splicing factor10.315starBase

9606.ENSP00000350028Putative helicase MOV-100.304starBase

9606.ENSP00000338371Trinucleotide repeat-containing gene6B

protein

0.144

NONHSAT016118(RP11-18I14.10)9606.ENSP00000385269ELAV-like protein10.396

9606.ENSP00000254108RNA-binding protein FUS0.175starBase

9606.ENSP00000220592Signal recognition particle54kDa protein0.121

9606.ENSP00000381031RNA-binding protein EWS0.096

9606.ENSP00000401371Nucleolysin TIA-1isoform p400.087

NONHSAT027801(RP11-350F4.2)9606.ENSP00000254108RNA-binding protein FUS0.440starBase

9606.ENSP00000240185TAR DNA-binding protein430.286

9606.ENSP00000220592Signal recognition particle54kDa protein0.276

9606.ENSP00000381031RNA-binding protein EWS0.268

9606.ENSP00000349428Polypyrimidine tract-binding protein10.187

Note:The corresponding score of each interaction is calculated by LPBNI.The higher the score,the higher possibility of the protein interacts with the query lncRNA.For each lncRNA,the proteins were ranked in a descending order based on the score.

bers of lncRNA–protein interactions that were correctly recov-ered with respect to different percentiles [32]of all the predic-tion results.As shown in Figure 3,LPBNI had the best performance for every percentile tested.Top-ranked results are of great importance,due to the low occurrence of false pos-itive results.When looking at the top 10%of the results,12of 69lncRNA–protein interactions were correctly retrieved by LPBNI,whereas only ?ve,eight,and six interactions were cor-rectly retrieved by lncPro,RPISeq-RF,and RPISeq-SVM,respectively.As for top 50%,LPBNI correctly recovers 57lncRNA–protein interactions,which represent 18,17,and 25interactions more than lncPro,RPISeq-RF,and RPISeq-SVM,respectively.These comparisons indicate that LPBNI outperforms lncPro and RPISeq in the prediction of lncRNA–protein interactions.

Prediction of novel lncRNA–protein interactions

Following the validation of the superior performance of LPBNI using LOOCV,we applied LPBNI onto the 4796known lncRNA–protein interactions downloaded from NPIn-ter [34],which includes 1113lncRNAs and 26proteins to pre-dict novel lncRNA–protein interactions.For each lncRNA,all the collected proteins were ranked according to the scores cal-culated by LPBNI,and the top ?ve proteins are considered potential lncRNA–interacting proteins.We present here the results for four lncRNAs,which include NONHSAT037119(RP11-349A22.5),NONHSAT010657(HNRNPU-AS1),NONHSAT016118(RP11-18I14.10),and NONHSAT027801(RP11-350F4.2).Top ?ve proteins and the corresponding scores for these lncRNAs are presented in Table 4.We searched other databases such as starBase [35]and lncRNome [36],and found that some top ranked proteins that are pre-dicted to interact with these lncRNAs are supported by star-Base [35],which is designed to decipher miRNA-target interactions and protein-RNA interactions.As shown in Table 4and 9606.ENSP00000254108(RNA-binding protein FUS),9606.ENSP00000401371(Nucleolysin TIA-1isoform p40),and 9606.ENSP00000349428(Polypyrimidine tract-binding protein 1)are predicted to interact with RP11-349A22.5.9606.ENSP00000290341(Insulin-like growth factor 2mRNA-binding protein 1),and 9606.ENSP00000258962(Serine/arginine-rich splicing factor 1),and 9606.ENSP00000350028(Putative helicase MOV-10)are predicted to interact with HNRNPU-AS1.FUS is predicted to interact with RP11-18I14.10and RP11-350F4.2.These predictions were all con?rmed by starBase [35].

Furthermore,we extracted the predictions con?rmed by starBase and compared their ranks predicted by LPBNI,lncPro,RPISeq-RF,and RPI-SVM (Table 5).The results showed that for these lncRNAs,there exist large differences in the ranks of most of the candidate proteins predicted by LPBNI,lncPro,RPISeq-RF,and RPI-SVM.Despite these great variations,candidate proteins are consistently ranked higher by LPBNI relative to the other three methods.For instance,for lncRNA RP11-350F4.2,FUS is ranked ?rst by LPBNI,but it is ranked 12th,15th,and 24th by lncPro,RPISeq-RF,and RPI-SVM,respectively.The results above show that LPBNI can identify potential lncRNA–interacting proteins as top candidates,implying that the use of LPBNI

T a b l e 5

T o p c a n d i d a t e p r o t e i n s p r e d i c t e d b y L P B N I w i t h s u p p o r t b y s t a r B a s e a n d t h e i r r a n k s p r e d i c t e d u s i n g d i f f e r e n t m e t h o d s

l n c R N A I D (N O N C O D E 4.0I D )

C a n d i d a t e p r o t e i n s

R a n k u s i n g e a c h m e t h o d

S T R I N G I D

N a m e

L P B N I

l n c P r o

R P I S e q -R F

R P I S e q -S V M

N O N H S A T 037119(R P 11-349A 22.5)

9606.E N S P 00000254108R N A -b i n d i n g p r o t e i n F U S 192249606.E N S P 00000401371N u c l e o l y s i n T I A -1i s o f o r m p 40441669606.E N S P 00000349428

P o l y p y r i m i d i n e t r a c t -b i n d i n g p r o t e i n 1

5

23

11

10

N O N H S A T 010657(H N R N P U -A S 1)

9606.E N S P 00000290341

I n s u l i n -l i k e g r o w t h f a c t o r 2m R N A -b i n d i n g p r o t e i n 11

3

21

21

9606.E N S P 00000258962S e r i n e /a r g i n i n e -r i c h s p l i c i n g f a c t o r 131********.E N S P 00000350028

P u t a t i v e h e l i c a s e M O V -10

4

4

15

15

N O N H S A T 016118(R P 11-18I 14.10)9606.E N S P 00000254108R N A -b i n d i n g p r o t e i n F U S 215826

N O N H S A T 027801(R P 11-350F 4.2)

9606.E N S P 00000254108R N A -b i n d i n g p r o t e i n F U S

112

1524

Ge M et al /lncRNA–protein Interaction Prediction

67

is a very effective way to predict novel lncRNA–protein interactions.

Discussion

In this study,we proposed and tested a novel computational method,LPBNI,for the prediction of potential lncRNA–pro-tein interactions.We constructed an lncRNA–protein bipartite network,using the information about lncRNA–protein inter-actions,lncRNA and proteins are connected if they were known to interact with each other.Following this,two-step propagation was carried out in the bipartite network to score and rank candidate proteins for each lncRNA.The proposed method has some important features.Firstly,LPBNI uses only the network constructed based on the known lncRNA–protein interactions to perform this prediction.Secondly,with an increasing degree of a node,less information is assigned to its direct neighbors.Finally,the propagation matrix is not symmetrical.The results of comparisons between LPBNI and other network-based methods show that LPBNI has higher AUC,compared to RWR and ProCF.In order to fur-ther evaluate the performance of the proposed method,we compare LPBNI with the existing methods for lncRNA–pro-tein pair prediction and obtain consistently higher ROC curves using LPBNI in relative to lncPro,RPISeq-RF,and RPISeq-SVM for the six lncRNAs tested.All the comparisons show that our method can effectively predict interactions between lncRNAs and proteins,largely by taking advantage of lncRNA–protein interaction network.The case study shows further that LPBNI is powerful not only for the recovery of known lncRNA–protein interactions,but also for the predic-tion of potential candidate proteins.This suggests that LPBNI may be a useful tool for predicting candidate lncRNA–inter-acting proteins that could be subjected to further experimental investigation for potential functional studies.

Despite the ef?ciency of LPBNI in the

candidate proteins for interacting with tions exist.Firstly,LPBNI can only be bipartite network,in which each node has at Since LPBNI only uses the prior information lncRNA–protein interactions,LPBNI date proteins if there is no information in the training set.This limitation may be ing the bipartite network to a bipartite on lncRNA/protein functional domains or expression pro?le of lncRNAs [38].interact with a lot of lncRNAs,which may information during the procedure of and in consequence have higher scores in Finally,the shortage of known limits the further analysis of lncRNA network,which may be addressed by a lncRNA datasets.

Conclusion

The prediction of lncRNA–protein important for the studies of the complex Existing methods are using the sequence lncRNAs and proteins,but in this study,network-based method,LPBNI,which takes full advantage of the information about the known lncRNA–protein interac-tions.We performed the evaluation and case study of this method,which further demonstrate its superior performance.

Materials and methods

Data collection and preprocessing

7576ncRNA–protein interactions were downloaded from the NPInter 2.0database [34](https://www.wendangku.net/doc/e57949445.html,/NPInter/)in November,2013,with the restriction of type ‘‘NONCODE ”and organism ‘‘Homo sapiens ”.Furthermore,we extracted 2380lncRNAs from a human lncRNA dataset downloaded from NONCODE 4.0database [39],and converted the IDs of lncRNAs and proteins,into NONCODE 4.0IDs and string IDs,separately.Finally,we got 4870lncRNA–protein interac-tions,including 2380lncRNAs and 106proteins.The lncRNA–protein bipartite network

The lncRNA–protein interaction network can be described as a graph G (L,P,E ),in which L ={l 1,l 2,...,l n }is de?ned as lncRNA set,P ={p 1,p 2,...,p m }is de?ned as the protein set,and E ={e i,j |L i 2L,P j 2P }is the edge set,where e i,j represents the edge connecting the nodes p i and l j .A ={a i,j |i 2P,j 2L }rep-resents the adjacent matrix,where a ij =1if p i interacts with l j ,otherwise a ij =0.For lncRNA l j ,positive samples referred to the proteins that are known to interact with l j ,and the remain-ing proteins were considered negative samples.A simple illus-tration of lncRNA–protein bipartite network construction is shown in Figure 4.Finally,the network was constructed and a propagation method was applied to compute the interaction score.

68Genomics Proteomics Bioinformatics 14(2016)62–71

the known lncRNA–protein interactions,and scores candidate proteins for each lncRNA.We classi?ed the nodes of lncRNA–protein interaction network into two different sets, named P and L as aforementioned and only the connections between different sets were allowed.LPBNI procedure is illus-trated in Figure5.For example,if the initial information of three proteins was1,1,and0,we?rst propagated information from proteins to the corresponding lncRNAs.Afterward,the information was allocated from lncRNAs back to proteins. Since the network is unweighted,the information in a protein is equally propagated to its direct neighbors in the lncRNA set, and vice versa.The propagation of information after each step is shown in Figure5B and C,respectively.This two-step prop-agation can be represented as:

010101information of protein P,s ij=1if p i interacts with l j,other-wise s ij=0.S L(l j),j2{1,2...n}represents the score on l j after the?rst step of information propagation,which can be calcu-lated as:

S Lel jT?

X m

i?1

a ij S0eiT

dep

i

T

e2T

where dep

i

T?

P n

j?0

a ij is the number of lncRNAs that interact with p i.

In the second step,all the information in L propagates back to P.S F(p i)is de?ned as the?nal information of protein p i,rep-resenting the interaction score of protein p i with l j.S F can be de?ned as

the LPBNI in bipartite network

initial information propagated from proteins to their direct neighbor lncRNAs.For example,

and0,respectively.B.The score on red circles is the information of each lncRNA received

information propagated from lncRNAs back to proteins.The score on blue hexagon in panel C is

two-step propagation.The red circles represent lncRNAs and the blue hexagons represent

Ge M et al/lncRNA–protein Interaction Prediction69

lncRNA–protein interaction network.Eq.(3)can be repre-sented as:

S FeiT?

X m

k?1

w ik S0ekTe5Twhere

W ij?

1

dep

i

T

X n

j?1

a ij a kj

del jT

e6T

Following the calculations,the proteins were ranked for l j by the?nal score S F.All of the candidate proteins are listed in a descending order,and highly-ranked proteins are consid-ered to interact with lncRNA l j.The data and source code are freely available at https://https://www.wendangku.net/doc/e57949445.html,/USTC-HILAB. Experimental design

LOOCV was performed on the lncRNA–protein interaction network for performance evaluation of the proposed method. In this process,each lncRNA–protein pair was left out in turn as a test sample,by setting the corresponding value in the adja-cent matrix A to0.The performance of LPBNI was estimated by the success rate it achieves in recovering the known lncRNA–protein interactions.In order to assess the perfor-mance of LPBNI,we plotted the ROC curves,and compared the AUC values obtained using LPBNI,RWR,and ProCF. Additionally,we computed Sp,Sn,Acc,Pre,and MCC values. The propagation matrix W presented in this paper relies on the adjacent matrix A of the bipartite network.When LOOCV was implemented,we obtained different W values,due to the change of A values in each step of LOOCV.Therefore,W value was recalculated for each lncRNA–protein pair that was left out as test sample.Furthermore,during LOOCV pro-cess,no information was propagated on the nodes with less than two links,and these nodes were not considered during the performance evaluation.

Authors’contributions

MG participated in the downloading and preprocessing of the datasets,carried out the design and performance evaluation of LPBNI in predicting lncRNA–protein interactions.AL con-ceived of the project and helped with the study design.MW was involved in data analysis.MG drafted the manuscript with the help of AL and MW.All authors read and approved the ?nal manuscript.

Competing interests

The authors have declared no competing interests. Acknowledgments

This work is supported by the National Natural Science Foun-dation of China(Grant Nos.61571414and61471331). Supplementary material

Supplementary material associated with this article can be found,in the online version,at https://www.wendangku.net/doc/e57949445.html,/10.1016/j. gpb.2016.01.004.References

[1]Pang KC,Frith MC,Mattick JS.Rapid evolution of noncoding

RNAs:lack of conservation does not mean lack of function.

Trends Genet2006;22:1–5.

[2]Koerner MV,Pauler FM,Huang R,Barlow DP.The function of

non-coding RNAs in genomic imprinting.Development 2009;136:1771–83.

[3]Laurent GS,Wahlestedt C.Noncoding RNAs:couplers of analog

and digital information in nervous system function?Trends Neurosci2007;30:612–21.

[4]Qu Z,Adelson DL.Evolutionary conservation and functional

roles of ncRNA.Front Genet2012;3:205.

[5]Guttman M,Rinn JL.Modular regulatory principles of large non-

coding RNAs.Nature2012;482:339–46.

[6]Volders P-J,Helsens K,Wang X,Menten B,Martens L,Gevaert K,

et al.LNCipedia:a database for annotated human lncRNA transcript sequences and structures.Nucleic Acids Res2013;41:D246–51. [7]Prensner JR,Chinnaiyan AM.The emergence of lncRNAs in

cancer biology.Cancer Discov2011;1:391–407.

[8]Mercer TR,Dinger ME,Mattick JS.Long non-coding RNAs:

insights into functions.Nat Rev Genet2009;10:155–9.

[9]Wang KC,Chang HY.Molecular mechanisms of long noncoding

RNAs.Mol Cell2011;43:904–14.

[10]?rom UA,Derrien T,Beringer M,Gumireddy K,Gardini A,

Bussotti G,et al.Long noncoding RNAs with enhancer-like function in human cells.Cell2010;143:46–58.

[11]Wang KC,Yang YW,Liu B,Sanyal A,Corces-Zimmerman R,

Chen Y,et al.A long noncoding RNA maintains active chromatin to coordinate homeotic gene expression.Nature 2011;472:120–4.

[12]Ponting CP,Oliver PL,Reik W.Evolution and functions of long

noncoding RNAs.Cell2009;136:629–41.

[13]Calin GA,C-g Liu,Ferracin M,Hyslop T,Spizzo R,Sevignani C,

et al.Ultraconserved regions encoding ncRNAs are altered in human leukemias and carcinomas.Cancer Cell2007;12:215–29.

[14]Esteller M.Non-coding RNAs in human disease.Nat Rev Genet

2011;12:861–74.

[15]Gupta RA,Shah N,Wang KC,Kim J,Horlings HM,Wong DJ,

et al.Long non-coding RNA HOTAIR reprograms chromatin state to promote cancer metastasis.Nature2010;464:1071–6. [16]Shi X,Sun M,Liu H,Yao Y,Song Y.Long non-coding RNAs:a

new frontier in the study of human diseases.Cancer Lett 2013;339:159–66.

[17]Taft RJ,Pang KC,Mercer TR,Dinger M,Mattick JS.Non-

coding RNAs:regulators of disease.J Pathol2010;220:126–39.

[18]Wapinski O,Chang HY.Long noncoding RNAs and human

disease.Trends Cell Biol2011;21:354–61.

[19]Kohlmaier A,Savarese F,Lachner M,Martens J,Jenuwein T,

Wutz A.A chromosomal memory triggered by Xist regulates histone methylation in X inactivation.PLoS Biol2004;2:E171.

[20]Tripathi V,Shen Z,Chakraborty A,Giri S,Freier SM,Wu X,

et al.Long noncoding RNA MALAT1controls cell cycle progression by regulating the expression of oncogenic transcrip-tion factor B-MYB.PLoS Genet2013;9:e1003368.

[21]Zhu J,Fu H,Wu Y,Zheng X.Function of lncRNAs and

approaches to lncRNA–protein interactions.Sci China Life Sci 2013;56:876–85.

[22]Liao Q,Liu C,Yuan X,Kang S,Miao R,Xiao H,et https://www.wendangku.net/doc/e57949445.html,rge-

scale prediction of long non-coding RNA functions in a coding–non-coding gene co-expression network.Nucleic Acids Res 2011;39:3864–78.

[23]Khalil AM,Rinn JL.RNA–protein interactions in human health

and disease.Semin Cell Dev Biol2011;22:359–65.

[24]Sacco LD,Baldassarre A,Masotti A.Bioinformatics tools and

novel challenges in long non-coding RNAs(lncRNAs)functional analysis.Int J Mol Sci2011;13:97–114.

70Genomics Proteomics Bioinformatics14(2016)62–71

[25]Bellucci M,Agostini F,Masin M,Tartaglia GG.Predicting

protein associations with long noncoding RNAs.Nat Methods 2011;8:444–5.

[26]Muppirala UK,Honavar VG,Dobbs D.Predicting RNA–protein

interactions using only sequence information.BMC Bioinformat-ics2011;12:489.

[27]Hearst MA,Dumais ST,Osman E,Platt J,Scholkopf B.Support

vector machines.IEEE1998;13:18–28.

[28]Liaw A,Wiener M.Classi?cation and regression by ran-

domForest.R News2002;2:18–22.

[29]Lu Q,Ren S,Lu M,Zhang Y,Zhu D,Zhang X,et al.

Computational prediction of associations between long non-coding RNAs and proteins.BMC Genomics2013;14:651. [30]Suresh V,Liu L,Adjeroh D,Zhou X.RPI-Pred:predicting

ncRNA-protein interaction using sequence and structural infor-mation.Nucleic Acids Res2015;43:1370–9.

[31]Ko hler S,Bauer S,Horn D,Robinson PN.Walking the

interactome for prioritization of candidate disease genes.Am J Hum Genet2008;82:949–58.

[32]Wang W,Yang S,Zhang X,Li J.Drug repositioning by

integrating target information through a heterogeneous network model.Bioinformatics2014;30:2923–30.

[33]Sarwar B,Karypis G,Konstan J,Riedl J.Item-based collabora-

tive?ltering recommendation algorithms.Proceedings of the10th

International Conference on World Wide Web Hong Kong,May 1–5,2001;ACM1-58113-348-0/01/0005.

[34]Yuan J,Wu W,Xie C,Zhao G,Zhao Y,Chen R.NPInter v2.0:

an updated database of ncRNA interactions.Nucleic Acids Res 2014;42:D104–8.

[35]Li JH,Liu S,Zhou H,Qu LH,Yang JH.starBase v2.0:decoding

miRNA-ceRNA,miRNA-ncRNA and protein–RNA interaction networks from large-scale CLIP-Seq data.Nucleic Acids Res 2013;42:D92–7.

[36]Bhartiya D,Pal K,Ghosh S,Kapoor S,Jalali S,Panwar B,et al.

LncRNome:a comprehensive knowledgebase of human long noncoding RNAs.Database2013;2013:bat034.

[37]Chen X,Yan GY.Novel human lncRNA–disease association

inference based on lncRNA expression pro?les.Bioinformatics 2013;29:2617–24.

[38]Chen X,Liu MX,Yan GY.Drug–target interaction prediction by

random walk on the heterogeneous network.Mol BioSyst 2012;8:1970–8.

[39]Xie C,Yuan J,Li H,Li M,Zhao G,Bu D,et al.NONCODEv4:

exploring the world of long non-coding RNA genes.Nucleic Acids Res2014;42:D98–103.

[40]Zhou T,Ren J,Medo M,Zhang YC.Bipartite network projection

and personal recommendation.Phys Rev E:Stat,Nonlin,Soft Matter Phys2007;76:046115.

Ge M et al/lncRNA–protein Interaction Prediction71

相关文档