|
·ÖÀർº½ |
 |
|
|
|
|
×ÊÔ´´óС£º341.91 KB |
×ÊÔ´ÀàÐÍ£ºÎĵµ |
ÏÂÔØ»ý·Ö£º 0 |
|
|
|
×ÊÔ´½éÉÜ |
|
DNAÐòÁÐÊý¾ÝÍÚ¾ò¼¼Êõ∗
ÖìÑïÓÂ1,2+, ÐÜÚS1
1(¸´µ©´óѧ¼ÆËã»úÓëÐÅÏ¢¼¼Êõϵ,ÉϺ£ 200433)
2(ÉϺ£ÉúÎïÐÅÏ¢¼¼ÊõÑо¿ÖÐÐÄ,ÉϺ£ 201203)
DNA Sequence Data Mining Technique
ZHU Yang-Yong1,2+, XIONG Yun1
1(Department of Computer and Information Technology, Fudan University, Shanghai 200433, China)
2(Shanghai Center for Bioinformation Technology, Shanghai 201203, China)
+ Corresponding author: Phn: +86-21-65642831, Fax: +86-21-65642219, E-mail: yunx@fudan.edu.cn, http://www.dmgroup.org.cn
Zhu YY, Xiong Y. DNA sequence data mining technique. Journal of Software, 2007,18(11):2766−2781. http://www.jos.org.cn/1000-9825/18/2766.htm
Abstract: DNA sequence is one of the basic and important data among biological data. Researching DNA sequence data and then comprehending life essential is a necessary task in post-genomic era. At present, data mining technique is one of the most efficient data analysis means, which finds out information hidden in data. It has also become main data analysis technique adopted in Bioinformatics. It has been applied in DNA sequence analysis, which has got wide attention and rapid development. And considerable research achievements have emerged. Provides an overview of research progress in DNA sequence data mining field. In more detail, it proposes three research phases including statistics-based data mining methods application, general data mining methods application, and specialized DNA sequence-oriented data mining methods design, and then elaborates that sequence similarity is foundation of DNA sequence data mining technique. It also analyzes and comments some key techniques in this field by combining with biological background, such as DNA sequential pattern, association, clustering, classification and outlier mining. Finally, future work and open issues are given, including the research of a novel storage model and index methods, the design of data mining algorithm based on biological domain knowledge.
Key words: DNA sequence; data mining; bioinformatics; sequential pattern; sequence similarity
Õª Òª: DNAÐòÁÐÊý¾ÝÊÇÒ»ÀàÖØÒªµÄÉúÎïÊý¾Ý.Ñо¿DNAÐòÁÐÊý¾Ý½â¶ÁÆäº¬ÒåÊǺó»ùÒò×éʱ´úµÄÖ÷ÒªÑо¿ÈÎÎñ.Êý¾ÝÍÚ¾òÊÇĿǰ×îÓÐЧµÄÊý¾Ý·ÖÎöÊÖ¶ÎÖ®Ò»,ÓÃÓÚ·¢ÏÖ´óÁ¿Êý¾ÝËùÒþº¬µÄ¸÷ÖÖ¹æÂÉ,Ò²ÊÇÉúÎïÐÅϢѧ²ÉÓõÄÖ÷ÒªÊý¾Ý·ÖÎö¼¼Êõ.½«Êý¾ÝÍÚ¾ò¼¼ÊõÓÃÓÚDNAÐòÁÐÊý¾Ý·ÖÎö,Òѵõ½Á˹㷺¹Ø×¢ºÍ¿ìËÙ·¢Õ¹,²¢È¡µÃÁËÐí¶àÑо¿³É¹û.×ÛÊöÁËDNAÐòÁÐÊý¾ÝÍÚ¾òÁìÓòµÄÑо¿×´¿öºÍ½øÕ¹,Ìá³öÁË3¸öÑо¿½×¶Î:»ùÓÚͳ¼ÆµÄÍÚ¾ò·½·¨Ó¦Óý׶Ρ¢Ò»°ã»¯ÍÚ¾ò·½·¨Ó¦Óý׶κÍרÃŵÄDNAÐòÁÐÊý¾ÝÍÚ¾ò·½·¨Éè¼Æ½×¶Î.²ûÊöÁËDNAÐòÁÐÊý¾ÝÍÚ¾òµÄ»ù´¡ÊÇÐòÁÐÏàËÆÐÔ,ÆÀÊöÁË
∗ Supported by the National Natural Science Foundation of China under Grant No.60573093 (¹ú¼Ò×ÔÈ»¿ÆÑ§»ù½ð); the National High-Tech Research and Development Plan of China under Grant No.2006AA02Z329 (¹ú¼Ò¸ß¼¼ÊõÑо¿·¢Õ¹¼Æ»®(863))
Received 2007-01-23; Accepted 2007-04-25
ÖìÑïÓ µÈ:DNA ÐòÁÐÊý¾ÝÍÚ¾ò¼¼Êõ 2767
DNAÐòÁÐÊý¾ÝÍÚ¾òÁìÓòËù²ÉÓõĹؼü¼¼Êõ,°üÀ¨DNAÐòÁÐģʽ¡¢¹ØÁª¡¢¾ÛÀà¡¢·ÖÀàºÍÒì³£ÍÚ¾òµÈ,·ÖÎöÌÖÂÛÁËÆäÏàÓ¦µÄÉúÎïÓ¦Óñ³¾°ºÍÒâÒå.×îºó¸ø³öDNAÐòÁÐÊý¾ÝÍÚ¾ò½øÒ»²½Ñо¿µÄÈȵãÎÊÌâ,°üÀ¨DNA |
|
ÏÂÔØµØÖ· |
|
|
|
|
|