Editor for this issue: Martin Jacobsen <marty
linguistlist.org>
Some time ago, I posed two queries (section 1 in the following sum) about part-of-speech based on syntactic distribution. I am very thankful for the researchers listed in section 2, who replied to the queries. The typical answers are listed in section 3. Some references they mentioned are listed in section 4. In addition, I present my personal conclusion about the problem in section 5 just for your information. In order to make the researchers who are not familiar with Chinese understand more clearly about my posing the queries, I list one open question, i.e., the first question in section 6. The other question in section 6 may also be interesting. Thank you very much. With best regards, Ji Donghong - ------------------------------------------ Kent Ridge Digital Labs 21 Heng Mui Keng Terrace Singapore, 119613 Email: dhjiMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issuekrdl.org.sg Tel: 65-8746380 Fax: 65-7744998 - ------------------------------------------ SUM: WHAT'S BEHIND PART-OF-SPEECH? 1. QUERIES Query A: In Chinese, there are fewer affixes for us to classify words into categories, e.g., nouns, verbs or adjectives, etc., so even up to now, there has been no information about POS for Chinese words in the most famous Chinese dictionary, i.e., Modern Chinese Dictionary. Some linguists proposed that Chinese words be classified as nouns, verbs and adjectives, etc. completely based on their grammatical distribution, which they referred to as their ability to combine with other words. My questions are: 1) Can such grammatical distribution be solely used as a means to determine POS of words? 2) Are there any similar problems in other languages? How to solve the problem there? Query B: Several days ago, I posed a query "what's behind part-of-speech?", up to now, more than 10 researchers have replied me. Now I would like to pose another query on the topic before presenting a summarisation: Q: Is the part-of-speech based on syntactic distribution a WELL-FORMED concept? Any comments or information will be highly appreciated. 2. ACKNOWLEDGEMENTS Adam Kilgarriff Geoffrey Sampson Marcia Haag Philip Resnik Sun Honglin Joseph Davis Christopher Hogan Frantisek Cermak Waruno Mahdi Atro Voutilainen Rob Freeman Vctor Vzquez Martnez Bingfu Lu Alex Murzaku Alexis Manaster Ramer Lua Kim Teng Earl Herrick Xu Jie Guo Jin Dan Maxwell Elaine Jones Anne-Line Graedler Steven Schaufele Robin Sackmann 3. ANSWERS 1) Some doubted whether categories such as N, V, ADJ etc. are good analytic categories for Chinese language, and that they may be inappropriate imports from the West. 2) Some pointed that grammatical distribution or functions are the standard, or primary way to classify POS. The reason mentioned include that the definition is clear and useful, or at least more so than alternatives. Some others proposed that syntactic valency be used to define POS among all syntactic means. 3) Some argued that grammatical distribution should not be used to determine lexical categories. The reasons mentioned include that there are predicate nouns, attributive verbs, sentential subjects, etc. 4) Some pointed that it is hardly surprising that grammarians have had trouble classifying Chinese words into parts of speech. The reason is the notion of "part-of-speech" is fraught with difficulties in linguistics, to the extent that many western linguists since 1900 have abandoned it altogether (though Chomsky did explicitly reintroduce the ancient notion in 1957 in his generative grammar). 5) Some replied the queries indirectly, pointing that the fact that POS disambiguation can be done on the basis of linguistically motivated contextual rules suggests that parts of speech are syntactically motivated or syntactically definable). 6) Some pointed that POS is not a particularly well-formed concept, not in the sense that you can define universally accepted unambiguous classes, no labelling will be objective and absolute, even the classical interpretations are uncertain. The reasons mentioned include that when you assign POS, you are partitioning a continuum of association behaviour. Further, they held that for language processing systems, POS is a misleading concept, and that we are better off thinking about the continuous reality of syntactic associativity, rather than trying to label it and pretend it is discrete. 7) Some pointed that ultimate criterion for POS should be meaning. The reasons mentioned include that although syntactic features are very limited, the combination of these features is, if not infinite, a huge amount. 8) Some pointed that outside of phonetics perhaps, there seems to be no concept in linguistics which is well-defined enough so given a language we can mechanically identify instances of that concept. They also pointed that linguistic concepts, whether part-of-speech, subject, or anything else, come into existence on the basis of someone describing one or a small number of languages, producing a term which refers to fairly (though not always precisely) well-defined set of entities in that (those) lg(s), and then the same person or more likely others trying to use the same term for entities in some other language(s) which SEEM to have something in common with those in the original language(s). 9) Some pointed that POS may be taken somewhat for granted by the linguistics community, linguists come to the task of defining POS with a preconceived notion of what it is they want to define, and then seek criteria that support these ideas. They also pointed for a given set of parts of speech, it is quite possible to find distributional evidence that pick out that set and nothing else, but that that set is by no means unique, and that many other possible sets may be supported by the data. 10) Some pointed that even today, the Parts Of Speech, as they are taught in the schools to English-speaking school children, are an illogical, messy list, two of them, the Noun and the Verb, have semantic definitions masquerading as grammatical/syntactic definitions, and the others have more or less syntactic definitions in terms of the Noun and the Verb. 11) Some pointed that there is nothing wrong when defining POS based on grammatical functions, rather, the problem is that we always have a pre- defined POS system, then the distribution is called just as a means to justify the system, which is very subjective. 4. REFERENCES Zhao Yuanren A Grammar of Spoken Chinese Ferdinand de Saussure, Cours de linguistique generale; Otto Jespersen, The Philosophy of Grammar; Edward Sapir, Language. Ellen Contini-Morava's "Introduction" and William Diver's "Theory" in Contini-Morava and Goldberg's volume "Meaning as Explanation: Advances in Linguistic Sign Theory," Mouton de Gruyter, 1995. Schachter, Paul. _Parts-of-speech Systems_. In Language typology and syntactic description. Timothy Shopen, ed. Cambridge: Cambridge University Press, 1992, pp. 3--61. Radford, Andrew. Transformational Grammar: A First Course. Cambridge: Cambridge University Press, 1992. Gabelentz, Georg von der, 1886, "Zur chinesischen Sprache und zur allgemeinen Grammatik", Internationale Zeitschrift fu"r allgemeine Sprachwissenschaft_ 3:92-109 (see there p. 100). Le van Ly, 48, _Le parler vietnamien. Esquisse d'une grammaire vietnamienne_. Paris: Huong Anh. Martini, Francois, 1950, "L'opposition nom et verbe en vietnamien et en siamois", _Bulletin de la Societe de Linguistique de Paris_ 46:183--196. Trnka, Bohumil, 1966, "On the Basic Categories of Syntagmatic Morphology", _Traveaux Linguistiques de Prague_ 2:165-169. Mahdi, Waruno, 1993, "Distinguishing Homonymic Word Forms in Indonesian", pp. 181-218 in Ger P. Reesink (ed.) _Topics in Descriptive Austronesian inguistics_, Semaian 11. Leiden: Vakgroep Talen en Culturen van ZO Asien en Oceanie. Rygaloff, A., 1958, "La classe nominale en chinois: determine/indetermine", Bulletin de la Societe de Linguistique de Paris_ 53:306-315. Hinrich Shutze, "Dimensions of Meaning" Chu, Fa-Kao; "Word classes in classical Chinese"; in Proceedings of the IXth Congress of linguistics; The Hague 196, p. 594. Hagege,Claude; "Le probleme linguistique de prepositions et la solution chinoise"; Louvain, Peeters, 1975. Sasse, Hans-Jurgen; "Syntactic categories and sub-categories"in J. Jacobs et al.; "Syntax. Ein internationales Handbuch der zeitgenossicher Forschung", Walter de Gruyter, Berlin, 1994. 1995 On the subject of Malagasy imperatives. Oceanic Linguistics 34: 203-210. 1994 On the origin of the term 'ergative'. Sprachtypologie und Universalienforschung 47(3): 207-210. 1993 Malagasy and the subject/topic issue. Oceanic Linguistics 31: 267-279. 1992 On intensional vs. extensional grammatical categories. Papers from the Second Annual Meeting of the Southeast Asian Linguistics Society (ed. Karen L. Adams and Thomas John Hudak), 201-212. Tempe, AZ: Arizona State University Program for Southeast Asian Studies. What's a topic in the Philippines? Papers from the First Annual Meeting of the Southeast Asian Linguistics Society (ed. Martha Ratliff and Eric Schiller), 271-291. Arizona State University Program for Southeast Asian Studies Monograph Series. 1988 What about Lisu? Languages of the Tibeto-Burman Area 11(2): 133-143. Karen L. Adams and AMR. Some questions of topic/focus choice in Tagalog. Oceanic Linguistics 27: 79-101. James D. McCawley's 1992 paper "Justifying Part-of-Speech Assignments in Mandarin Chinese", Journal of Chinese Linguistics_ vol 20, no. 2, pp. 211-245. Sadock (1990) "Parts of speech in Autolexical Syntax", in McCawley (1988) The Syntactic Phenomena of English. Vonen, Arnfinn Muruvik. 1997. Parts of Speech and Linguistic Typology. Open Classes and Conversion in Russian and Tokelau. (Acta Humaniora No. 22). Oslo: Universitetsforlaget. (ISBN 82-00-12685-4) Sackmann, Robin, 1996, The problem of "adjectives" in Mandarin Chinese, in Sackmann, Robin (ed.) Theoretical linguistics and grammatical description. Amsterdam etc.: John Benjamins Publishing Co. p.257-275. 5. PERSONAL CONCLUSION My personal conclusion is that POS based on syntactic distribution is not a well-formed concept. The reasons are that: 1) Non-operable. For a word of a given language, what is its syntactic distribution? It seems that there is no clear definition. The most natural modelling for the syntactic distribution of a word may be the context in which the word can occur, however we cannot list all in any sense. 2) Non-deterministic: Even if we can select, based on whatever reasons, a definite set of distributional evidences, e.g., contexts, functions or co-occurrences, as criteria to define the POS system for a language, there should exist many many classes, and many many classifications for the whole word set. It seems that we don't have any reasonable reason to choose a particular classification among all as the POS system for the considered language. 3) Non-provable or non-justifiable: Even if we can select a particular classification as the POS system based on whatever reasons, it seems that there is no sense in which we can say that the selected POS system is correct or incorrect. The deeper reason for this problem may be that distributional theories about POS don't care about WHAT (is the part of speech, e.g., nouns, verbs, etc. of a language?), only care about HOW (to construct a POS system for a language?), or at least they equalise WHAT and HOW and don't care about the distinction between them. Thus it may be difficult for us to justify a POS system for a language, or compare different POS systems for a language in a significant sense. 6. OPEN QUESTIONS 1) Suppose that we are given a language, which is just like English, however without any affixes, e.g., -ment, -ing, -ed, -tion, -sion, etc., So the following are all possible phrases in the language: make develop; develop country; develop product, etc. Now the problem is: How to determine the distribution-based POS system for the language? (The case is roughly like that in Chinese.) 2) If POS based on distribution is not well-formed, what possible influences can the non-well-formedness have on the syntactic theories built based on POS?