Editor for this issue: Karen Milligan <karen
linguistlist.org>
One of the unfortunate miscorrelations in identification of parallelisms has been the coining of the idea of the DNA code "word". In actuality, this is better looked at as a "phoneme" analogue. You therefore get 64 possible "genemes" There are languages with this many phonemes (though most have fewer). 20 or so actually encoded amino acids would correspond to 20 "genemes". Lots of languages withMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue20 phonemes. All the shape, charge, size, solubility properties of the amino acid side chains, should one utilize the maximally expanded idealized code (64 different members) could be handled by extra "distinctive features". The internal structure of the representational cube diagram would then represent instantiation of combinatorics of these features, within the constraints of the 3 "letter" code "word". Given the expense of maintainance and copying of nucleic acids, one wonders whether the code, far from being a sloppy expansion of some two-letter original (as has been proposed, where the third letter is "wobbly", leading to redundancy and degeneracy of the code), might be better thought of as a more resource-economically efficient reduction of a proto-code with a higher number of letters. Such a code would be able to capture finer details, but its physical carrier would be much bulkier- I think that there is a hint here about language structure and evolution too, if the signal inversion hypothesis is correct. The next major structural level within actual genes coding for proteins is that corresponding to the "domain"- various geometrically defined structural motifs found in protein structure- sheets, rods, folds, etc. This reminds me a great deal of the Bolingerian "phonaestheme"- recurrent partials which often have vaguely or strongly sensed meaning but which are smaller than the synchronic "root" of a word. Diachronically, such partials, at least in language, correspond to ancestral roots and root/affix combinations. If the genetic analogy is ok, then perhaps such domain structure also represented standard compositional elements utilized to build up larger structures (and in fact some leading geneticists have speculated that this is indeed the case, and eucaryotic intervening sequences exist to aid such recombination). Higher up, full genes would represent complete stems by analogy, with various secondary modifications and transcription instructions tacked alongside and/or within as "inflections". Rhythmic/metrical structure is also evident in both DNA and their protein products, both at the level of gross mass distribution and actual vibratory behavior. Repeated "junk" sequences, skews even within genes away from even averaged mixes of the four code letters, all sorts of things. Rates of transcription, tensions on the actual molecular chains. We don't know yet definitively whether there is any overarching ground plan within the genome structurally definable as "the book of life" we hear so much about. At a lower level, though, there are definite hints. In bacteria, genes are often in serial order (just as the case with serial verbs in language)- not only are the protein products physically produced one after the other (keeping their relative numbers in a one to one to one, etc. ratio), but activity-wise the chemical product of one is the raw material of the next in line etc. And there is often physical chaining, keeping each "reactor" attached to the next one. Major efficiency. In eucaryotes, such as us, gene families, such as the hemoglobins, or immunoglobins, are found "together". In the former, order of genes in the string may reflect not only the order of phylogenetic copying, but also the order of activation ontogenetically. And the genes which control the development of the underlying segmental body plan in all higher organisms (from worms to humans), keep the head-to-tail arrangement tightly linearized. So we have mapping on the genome to both mostly temporal (hemoglobins) and mostly spatial (body plan) effects. Linguistic texts also have such ordering within them, at various hierarchical levels- luckily I don't have to expound upon that here. So the question is, just how far does it go, this parallelism? At a deeper level, what can the structural dynamicity of both genetic and linguistic systems tell us about the origins and histories of both. Can understanding of the one enrich that of the other? It seems clear, for instance, that the rise of complex morphological structure makes the necessity of a really large base lexicon less necessary- indeed the most polysynthetic structures contain the fewest numbers of base morphemes in any language. Could analogues to such hierarchy within the genome have been the impetus leading to virus structure, with overlapping genes? Could there be, over vastly longer stretches of time, something like a "typological cycle" for living organisms at the level of the genome? There are already known higher organisms and even unicellular species, all living parasitically, moving towards large scale jettisoning of their own genes. And higher viruses seem to be accumulating genes. Where does it end? As to origins, I've speculated that a nonsyntactic precursor capable of producing very large numbers of temporally short, maximally featured signals gave way, through inversion of signal structure, to language as we know it. Could a similar process have occurred in the origins of the genome? In this scenario, instead of genes-as-we-know -them, all nicely strung together in ever larger functional units, we would have the analogue of high-feature-number lexical matrix- all possible combinations. So we would have an ideophone-like continuum. Such a multidimensional continuum possibly would have to be constructed multidimensionally, so instead of a nice long linear sequence we have a literal matrix of short strings. Jess Tauber zylogy
aol.com The appositional structure of bacteria poses a similar problem of bottom-up/top-down developmental perspective: some scientists believe it represents the original form, with eucaryotic split genes (us) representing an innovative advance, others believing that it is a streamlining of form starting with split genes and seeing the editing of intervening noncoding sequences out.