LINGUIST List 12.438

Fri Feb 16 2001

Disc: Parallelism Between Lang & Genome

Editor for this issue: Karen Milligan <>


  1. Zylogy, Re: 12.396, Disc: Parallelism Between Lang & Genome

Message 1: Re: 12.396, Disc: Parallelism Between Lang & Genome

Date: Thu, 15 Feb 2001 17:41:12 EST
From: Zylogy <>
Subject: Re: 12.396, Disc: Parallelism Between Lang & Genome

One of the unfortunate miscorrelations in identification of parallelisms has 
been the coining of the idea of the DNA code "word". In actuality, this is 
better looked at as a "phoneme" analogue. You therefore get 64 possible 
"genemes" There are languages with this many phonemes (though most have 
fewer). 20 or so actually encoded amino acids would correspond to 20 
"genemes". Lots of languages with 20 phonemes. 

All the shape, charge, size, solubility properties of the amino acid side 
chains, should one utilize the maximally expanded idealized code (64 
different members) could be handled by extra "distinctive features". The 
internal structure of the representational cube diagram would then represent 
instantiation of combinatorics of these features, within the constraints of 
the 3 "letter" code "word".

Given the expense of maintainance and copying of nucleic acids, one wonders 
whether the code, far from being a sloppy expansion of some two-letter 
original (as has been proposed, where the third letter is "wobbly", leading 
to redundancy and degeneracy of the code), might be better thought of as a 
more resource-economically efficient reduction of a proto-code with a higher 
number of letters. Such a code would be able to capture finer details, but 
its physical carrier would be much bulkier- I think that there is a hint here 
about language structure and evolution too, if the signal inversion 
hypothesis is correct. 

The next major structural level within actual genes coding for proteins is 
that corresponding to the "domain"- various geometrically defined structural 
motifs found in protein structure- sheets, rods, folds, etc. This reminds me 
a great deal of the Bolingerian "phonaestheme"- recurrent partials which 
often have vaguely or strongly sensed meaning but which are smaller than the 
synchronic "root" of a word. Diachronically, such partials, at least in 
language, correspond to ancestral roots and root/affix combinations. If the 
genetic analogy is ok, then perhaps such domain structure also represented 
standard compositional elements utilized to build up larger structures (and 
in fact some leading geneticists have speculated that this is indeed the 
case, and eucaryotic intervening sequences exist to aid such recombination).

Higher up, full genes would represent complete stems by analogy, with various 
secondary modifications and transcription instructions tacked alongside 
and/or within as "inflections". Rhythmic/metrical structure is also evident 
in both DNA and their protein products, both at the level of gross mass 
distribution and actual vibratory behavior. Repeated "junk" sequences, skews 
even within genes away from even averaged mixes of the four code letters, all 
sorts of things. Rates of transcription, tensions on the actual molecular 

We don't know yet definitively whether there is any overarching ground plan 
within the genome structurally definable as "the book of life" we hear so 
much about. At a lower level, though, there are definite hints. In bacteria, 
genes are often in serial order (just as the case with serial verbs in 
language)- not only are the protein products physically produced one after 
the other (keeping their relative numbers in a one to one to one, etc. 
ratio), but activity-wise the chemical product of one is the raw material of 
the next in line etc. And there is often physical chaining, keeping each 
"reactor" attached to the next one. Major efficiency.

In eucaryotes, such as us, gene families, such as the hemoglobins, or 
immunoglobins, are found "together". In the former, order of genes in the 
string may reflect not only the order of phylogenetic copying, but also the 
order of activation ontogenetically. And the genes which control the 
development of the underlying segmental body plan in all higher organisms 
(from worms to humans), keep the head-to-tail arrangement tightly linearized. 
So we have mapping on the genome to both mostly temporal (hemoglobins) and 
mostly spatial (body plan) effects.

Linguistic texts also have such ordering within them, at various hierarchical 
levels- luckily I don't have to expound upon that here.

So the question is, just how far does it go, this parallelism? At a deeper 
level, what can the structural dynamicity of both genetic and linguistic 
systems tell us about the origins and histories of both. Can understanding of 
the one enrich that of the other?

It seems clear, for instance, that the rise of complex morphological 
structure makes the necessity of a really large base lexicon less necessary- 
indeed the most polysynthetic structures contain the fewest numbers of base 
morphemes in any language. Could analogues to such hierarchy within the 
genome have been the impetus leading to virus structure, with overlapping 
genes? Could there be, over vastly longer stretches of time, something like a 
"typological cycle" for living organisms at the level of the genome? There 
are already known higher organisms and even unicellular species, all living 
parasitically, moving towards large scale jettisoning of their own genes. And 
higher viruses seem to be accumulating genes. Where does it end?

As to origins, I've speculated that a nonsyntactic precursor capable of 
producing very large numbers of temporally short, maximally featured signals 
gave way, through inversion of signal structure, to language as we know it. 
Could a similar process have occurred in the origins of the genome? In this 
scenario, instead of genes-as-we-know -them, all nicely strung together in 
ever larger functional units, we would have the analogue of 
high-feature-number lexical matrix- all possible combinations. So we would 
have an ideophone-like continuum. Such a multidimensional continuum possibly 
would have to be constructed multidimensionally, so instead of a nice long 
linear sequence we have a literal matrix of short strings.

Jess Tauber

The appositional structure of bacteria poses a similar problem of 
bottom-up/top-down developmental perspective: some scientists believe it 
represents the original form, with eucaryotic split genes (us) representing 
an innovative advance, others believing that it is a streamlining of form 
starting with split genes and seeing the editing of intervening noncoding 
sequences out.

Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue