As much as 97 percent of the DNA in mammalian genomes apparently does not code for protein amino acid sequences. Some of the noncoding DNA is known to function in various gene regulatory roles. The remainder of the noncoding DNA consists mainly of introns the functions of which are largely unknown. In this study large (72,000 base pairs) concatenated sequences of human coding and intronic DNA were analyzed by means of information theoretic and linguistic FORTRAN algorithms on a Sun Sparc 1000 system. The aim was to determine the statistical and linguistic "textures" of the two categories of DNA as a means of developing a new line of evidence that might provide a basis for an empirical distinction between an intelligent-design origin and an evolutionary origin of genomes. Calculations were run on both the natural DNA sequences and on their randomized counterparts. Similar analyses were performed on the sequenced genome of Mycoplasma genitalium, which consists of 88% coding DNA and does not contain introns.

The hierarchical information content of the human (concatenated) coding sequences examined in this study was 1.948-1.951 bitslnucleotide up to the dinucleotide level and 1.912-1.916 bits/nucleotide up to the pentanucleotide level. For intronic DNA the corresponding values were 1.905-1.947 and 1.876-1.901. The Shannon redundancies for the coding DNA sequences are 1.34-1.44% at the dinucleotide level and 2.70-2.83% at the pentanucleotide level. The corresponding values for intronic DNA are 1.36-3.68% and 304-4.84%. The linguistic vocabularies of coding and noncoding DNA sequences of comparable lengths show Significant differences in preferred (standard deviate 2: 3.0) oligomers and avoided (standard deviate s -3.0) oligomers. Intronic sequences exhibit marked modulo 2 periodicities in the spacing of pairs of mirror-symmetric oligomers whereas coding sequences do not show this periodicity. Mirror-complementary oligomers are less abundant than mirror-symmetric and tandemly repeating oligomers in both the coding and noncoding DNAs. Mirror-complementary oligomers occur with higher frequencies in intronic sequences compared to their randomized counterparts than in codonic sequences compared to their randomized counterparts. Coding sequences show marked periodicities modulo 3 in the spacing of tandemly repeating oligomers, whereas the intronic sequences examined in this study do not show this periodicity. The pattern of frequencies of protein-binding sequences in introns differs from that of coding DNA.

It is concluded that significant statistical and linguistic differences exist between the coding and intronic DNA of the human genome. These results are consistent with the hypothesis that intronic DNA may playa variety of vital roles in the cell biology of development in multicellular organisms. It is plausible that these roles were present from the very beginning of the existence of organisms on the earth.


Coding DNA, noncoding DNA, introns, hierarchical information, redundancy, linguistic analysis, Fourier analysis, protein-binding sequences


DigitalCommons@Cedarville provides a publication platform for fully open access journals, which means that all articles are available on the Internet to all users immediately upon publication. However, the opinions and sentiments expressed by the authors of articles published in our journals do not necessarily indicate the endorsement or reflect the views of DigitalCommons@Cedarville, the Centennial Library, or Cedarville University and its employees. The authors are solely responsible for the content of their work. Please address questions to dc@cedarville.edu.



To view the content in your browser, please download Adobe Reader or, alternately,
you may Download the file to your hard drive.

NOTE: The latest versions of Adobe Reader do not support viewing PDF files within Firefox on Mac OS and if you are using a modern (Intel) Mac, there is no official plugin for viewing PDF files within the browser window.