LINGUIST List 9.1241

Tue Sep 8 1998

Disc: POS is well-formed or not well-formed?

Editor for this issue: Brett Churchill <brettlinguistlist.org>


Directory

  1. Deborah D K Ruuskanen, Re: 9.1235, Disc: POS is well-formed or not well-formed?
  2. Christopher A. Brewster, Re: 9.1235, Disc: POS is well-formed or not well-formed?

Message 1: Re: 9.1235, Disc: POS is well-formed or not well-formed?

Date: Tue, 8 Sep 1998 09:23:34 +0300 (EET DST)
From: Deborah D K Ruuskanen <druuskancc.helsinki.fi>
Subject: Re: 9.1235, Disc: POS is well-formed or not well-formed?

The comparison of linguistic segmentation with molecular chemistry is
not valid IMO, precisely because the rules for segmentation differ for
different languages. Consider:

	brot - her
	
	broth - er

The first example follows the rules for segmentation used for Finnish,
the second the rules for English - at least for written text. There is
never going to be any agreement on these rules like the agreement for
defining atoms in chemistry, because the symbol/sound systems are so
different. Finnish, BTW, does not leave 'space' between the segments
(words?) so that the post-positions are glued on to the main segment
(head), which is the case in all agglutinative languages. So all we have
left is the statistical distribution based well-formed theory which is
in the textbooks. Unless you all can come up with something better,
which is what I assume this discussion is all about.
Cheers, DKR 
- 
Deborah D. Kela Ruuskanen 
Leankuja 1, FIN-01420 Vantaa 
druuskancc.helsinki.fi 
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue

Message 2: Re: 9.1235, Disc: POS is well-formed or not well-formed?

Date: Tue, 08 Sep 1998 02:16:20 +0300
From: Christopher A. Brewster <brewsterupatras.gr>
Subject: Re: 9.1235, Disc: POS is well-formed or not well-formed?

There aspects of this discussion which are reminiscent of the post-Bloomfield period
in linguistics where supposedly one could only go strictly bottom up and no
'cheating' was allowed. Everything had to be determined purely on distributional
criteria. In fact, of course, everyone used their common sense indeciding what a
word was, or a morpheme etc.

While there may be theoretically a large number of possible POS systems for a given
language, there are such factors as theoretical elegance, common sense and seeing
what works. We all know that language never has fit our neat theoretical containers
very well, but that is why we search for better theoretical accounts.

I would like in addition to ask whether anyone has applied language modelling
methods such as described by Brown et. al 1992 'Class-based n-gram models of natural
language' or McMahon & Smith 1996 'Improving Statistical Language Models Performance
with Automatically Generated Word Hierarchies' to a language like Chinese. The
approaches described in these papers result in very significant POS type
classification structures for languages like English while the criteria are quite
reasonable.

Christopher Brewster
University of Patras
brewsterupatras.gr
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue