Zipf's Law Probably Existing in Protein Sequences
摘要: 为了分析蛋白质序列中是否存在语言学中的Zipf定律,从蛋白质二级结构数据库DSSP中抽取1.7357万条序列,把具有相同二级结构标记的氨基酸残基连续片段定义为单词,结果表明:单词出现的频率分布近似服从指数为0.981的Zipf定律.Abstract: In order to analyze whether Zipf's law in linguistics exists in protein sequences, this paper uses 1.735 7 × 104 protein sequences labeled with secondary structures which are selected from the DSSP database. The segments of successive amino acid residues with a same code of secondary structure are defined as words. The results show that the distribution of word emerging frequency follows Zipf's law with the exponent as 0.981.