iRSpot-EL: identify recombination spots with an ensemble learning approach

| Home | Server | ReadMe | Benchmark Data | Citation | Contact us |


Descriptions of iRSpot-EL web server

Recombination plays an important role in genetic evolution, which describes the exchange of genetic information during the period of each generation in diploid organisms. Recombination provides many new combinations of genetic variations and is an important source for biodiversity, which can accelerate the procedure of biological evolution. Knowledge of recombination spots may also provide very useful information for in-depth understanding the reproduction and growth of cells. Therefore, it is highly demanded to develop computational methods for predicting the recombination spots.

    Actually, many efforts have been made in this regard. For instance, based on the gapped dinucleotide composition features, Jiang et al. (1) developed a predictor called RF-DYMHC to do the job. Liu et al. (2), using the kmer approach and the increment of diversity combined with quadratic discriminant analysis, developed the IDQD predictor for the same purpose. In the above two predictors, however, only the local DNA sequence information was utilized, and hence their prediction quality may be limited. To improve this situation, recently two new predictors, iRSpot-PseDNC (3) and iRSpot-TNCPseAAC (4), were developed. The former was based on the DNA local structural properties and pseudo dinucleotide composition; while the latter based on the DNA trinucleotide composition (5) as well as the corresponding pseudo amino acid components (6).

    Each of the aforementioned methods has its own advantage, and did play a role in stimulating the development of this important area. Meanwhile, they also have some disadvantages, as reflected by the following facts. 1) None of these methods allows users to set the desired parameters for prediction, and hence it is difficult for them to optimize the predictor system according to the need of their focus. 2) Except the RF-DYMHC (1), all the other predictors cannot be directly used for genome-wide analysis. Even for the RF-DYMHC predictor, its approach is not accurate because the window size therein is arbitrary.

    To address these problems, a new web-server predictor has been proposed for identifying the recombination spots. It is formed by fusing (7) various modes of pseudo nucleotide components (8-11) as well as dinucleotide-based auto-cross covariance (12) into an ensemble classifier called iRSpot-EL. The new predictor not only allows users to select their desired parameters but is also more natural in conducting the genome-wide analysis due to its built-in flexible sliding window approach (7). Particularly, compared with the existing predictors in this area, iRSpot-EL is more accurately and stable.



REFERENCES

1.Jiang, P., Wu, H., Wei, J., Sang, F., Sun, X. and Lu, Z. (2007) RF-DYMHC: detecting the yeast meiotic recombination hotspots and coldspots by random forest model using gapped dinucleotide composition features. Nucleic Acids Research, 35, W47-W51.

2.Liu, G., Liu, J., Cui, X. and Cai, L. (2012) Sequence-dependent prediction of recombination hotspots in Saccharomycescerevisiae. Journal of Theoretical Biology, 293, 49-54.

3.Chen, W., Feng, P.-M., Lin, H. and Chou, K.-C. (2013) iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition. Nucleic Acids Research, 41, e68. (cited by 245 papers)

4.Qiu, W.-R., Xiao, X. and Chou, K.-C. (2014) iRSpot-TNCPseAAC: Identify Recombination Spots with Trinucleotide Composition and Pseudo Amino Acid Components. International Journal of Molecular Sciences, 15, 1746-1766. (cited by 116 papers)

5.Chen, W., Lei, T.Y., Jin, D.C., Lin, H. and Chou, K.C. (2014) PseKNC: a flexible web-server for generating pseudo K-tuple nucleotide composition. Anal. Biochem, 456, 53-60. (cited by 87 papers)

6.Chou, K.C. (2001) Prediction of protein cellular attributes using pseudo amino acid composition. PROTEINS: Structure, Function, and Genetics, 43, 246-255. (cited by 1208 papers)

7.Chou, K.C. and Shen, H.B. (2007) Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. Biochem Biophys Res Comm (BBRC), 357, 633-640. (cited by 332 papers)

8.Liu, B., Liu, F., Fang, L., Wang, X. and Chou, K.C. (2015) repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects. Bioinformatics, 31, 1307-1309. (cited by 85 papers)

9.Liu, B., Liu, F., Wang, X., Chen, J., Fang, L. and Chou, K.-C. (2015) Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences. Nucleic Acids Research, W1, W65-W71. (cited by 100 papers)

10.Guo, S.-H., Deng, E.-Z., Xu, L.-Q., Ding, H., Lin, H., Chen, W. and Chou, K.-C. (2014) iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition. Bioinformatics, 30, 1522-1529. (cited by 161 papers)

11.Lin, H., Deng, E.-Z., Ding, H., Chen, W. and Chou, K.-C. (2014) iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition. Nucleic Acids Research, 42, 12961-12972. (cited by 126 papers)

12.Liu, B., Wang, X., Lin, L., Dong, Q. and Wang, X. (2009 ) Exploiting three kinds of interface propensities to identify protein binding sites. Computational Biology and Chemistry, 33, 303-311. (cited by 28 papers)


Copyright@By Liu Lab, HARBIN INSTITUTE OF TECHNOLOGY, SHENZHEN.