Editor for this issue: Brett Churchill <brett
linguistlist.org>
The comparison of linguistic segmentation with molecular chemistry is not valid IMO, precisely because the rules for segmentation differ for different languages. Consider: brot - her broth - er The first example follows the rules for segmentation used for Finnish, the second the rules for English - at least for written text. There is never going to be any agreement on these rules like the agreement for defining atoms in chemistry, because the symbol/sound systems are so different. Finnish, BTW, does not leave 'space' between the segments (words?) so that the post-positions are glued on to the main segment (head), which is the case in all agglutinative languages. So all we have left is the statistical distribution based well-formed theory which is in the textbooks. Unless you all can come up with something better, which is what I assume this discussion is all about. Cheers, DKR - Deborah D. Kela Ruuskanen Leankuja 1, FIN-01420 Vantaa druuskanMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuecc.helsinki.fi
There aspects of this discussion which are reminiscent of the post-Bloomfield period in linguistics where supposedly one could only go strictly bottom up and no 'cheating' was allowed. Everything had to be determined purely on distributional criteria. In fact, of course, everyone used their common sense indeciding what a word was, or a morpheme etc. While there may be theoretically a large number of possible POS systems for a given language, there are such factors as theoretical elegance, common sense and seeing what works. We all know that language never has fit our neat theoretical containers very well, but that is why we search for better theoretical accounts. I would like in addition to ask whether anyone has applied language modelling methods such as described by Brown et. al 1992 'Class-based n-gram models of natural language' or McMahon & Smith 1996 'Improving Statistical Language Models Performance with Automatically Generated Word Hierarchies' to a language like Chinese. The approaches described in these papers result in very significant POS type classification structures for languages like English while the criteria are quite reasonable. Christopher Brewster University of Patras brewsterMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueupatras.gr