* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 16.1580

Tue May 17 2005

Disc: Re: A Challenge to the Minimalist Community

Editor for this issue: Michael Appleby <michaellinguistlist.org>

To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
        1.    Christopher Manning, Re: A Challenge to the Minimalist Community
        2.    Martha McGinnis, Re: A Challenge to the Minimalist Community

Message 1: Re: A Challenge to the Minimalist Community
Date: 16-May-2005
From: Christopher Manning <manningcs.stanford.edu>
Subject: Re: A Challenge to the Minimalist Community

I personally tend to feel that these sorts of meta-discussions aren't
very useful and so have tried to stay clear, but let me just inject some
facts into the discussion, since subsequent mentions of the Klein and
Manning work have been incorrect in major respects (one case looks
like a genuine misunderstanding, others like the authors couldn't have
read all of either paper...).

This isn't to say that there wasn't a core of value in the original
posting: I do think it is high time that linguists look more at the success
that is being achieved by empirical and machine learning methods and
use it to question some of the assumptions of the dominant theories,
assumptions that were adopted in the 1950s before there were
successful empirical and machine learning methods for any domain.

On 6 May 2005, Sean Fulop in Linguist 16.1364 wrote:
> They go on to reference work by Klein and Manning which induces
> grammars in an "unsupervised" fashion from text. Well first of all, it
> is still debatable whether anything can ever actually do this (see the
> algorithmic learning theory literature, summarized in Jain et al.
> 1999), and Sproat and Lappin also note that Klein and Manning's
> scheme uses part-of-speech tagged text, which is a far cry from
> text. This is a huge annotation, and could be taken as a
> component of Universal Grammar. It is a component that is argued
> for in P&P, as well.

and later referring to our work,

On 11 May 2005, Carson Schutze in Linguist 16.1505 wrote:
> No one in P&P ever claimed that inducing the ability to parse a
> representative subset of a corpus of everyday speech to a certain
> approximation (given POS tags) required innate linguistic machinery.

It just isn't the case that the Klein and Manning results require part-of-
speech tagged text. Both of the papers cited in the original post (and
below) -- Klein and Manning 2002 and 2004 -- show results working
from simply a sequence of words and doing automatic distributional
induction of word classes. And see also Dan's thesis (Klein 2005) for
the most recent and complete exposition of the work. A lot of the
results we present are from pre-tagged text and furthermore the word
class induction method that we use is rather simple - it's not as good
as methods already proposed in Schuetze 1995, let alone other
recent promising work, of which the best is perhaps Clark (2003). But
that's just because it wasn't our focus, precisely because there was
previous quite successful work on learning word classes. I would
conjecture that our fully unsupervised results would improve
considerably if one simply welded Clark's word class induction to the
rest of our system. (And if one doesn't want to assume a list of words
as input, there is other unsupervised work that has looked at word
segmentation, and phoneme recognition.... Start welding it together.)

[Clark's work is also relevant to the algorithmic learning theory
comment: it's not clear to me how relevant such work on learnability of
general classes like regular or context-free languages is to human
language learnability, since the latter may very well depend on data-
dependent features of the rather restricted class of languages that are
human languages (something Chomsky would maybe even agree
with), but to the extent that one examines such work, again work such
as Clark and Thollard (2004) shows that probabilistic formal
languages have better learnability possibilities: they show that a rather
broad class of PFAs (Probabilistic Deterministic Finite State Automata)
are PAC-learnable from positive data alone.]

On 11 May 2005, Carson Schutze in Linguist 16.1505 wrote:
> > What is particularly notable about the Klein-Manning grammar
> > induction procedures is that they do what Chomsky and others
> > have argued is impossible: They induce a grammar using general
> > statistical methods which have few, if any, built-in assumptions
> > that are specific to language.
> To even debate this, we would have to establish a definition
> for "grammar"; earlier in the paragraph this system is described as
> inferring a "parser", which, as has been discussed, is crucially not
> the same thing under usual interpretations of these terms.
> The important point is the suggestion that some 'alternative(s)' to
> P&P can supposedly do "what Chomsky and others have argued is
> impossible ... induce a grammar". Here we have a comparison
> based on a false premise, it seems to me. What is the evidence that
> the Klein/Manning algorithms induce a grammar that has the
> properties Chomsky argued required innate structure to learn? All
> we've been told about it is that it parses some corpora at some rate
> less than 80% but is "quickly converging" on that level of accuracy.

To be precise, what Klein and Manning do is show that given a
reasonable amount of text (but in no way huge! - less than 100,000
words), we can learn the constituent units and dependencies/
headedness of that text (with a reasonable degree of success). The
model that is built from the data could reasonably be called a grammar
(though certainly not one that knows about things like binding theory
or long distance dependencies), but we don't actually build a parser at
all - though that would be an obvious extension, since a treebank
parser could be built on the results by supervised learning methods in
the usual way. While a human language grammar is clearly much
more than knowledge of constituency, constituency is such an
important and basic part of knowledge of language that I do feel that it
is a very reasonable first target, and a reasonable thing to feel that
you should be able to do better with a P&P/Minimalist language
learner, precisely because a large part of the principles and
parameters that have concretely been proposed do deal with issues of
phrase structure.

Later, Carson writes:

> What are we to make of "with this in mind" as a connective between
> the upper (and preceding) paragraphs and the lower? The former
> talks about learning a grammar of a natural language. The latter
> talks about correctly parsing 90% of examples sampled from some
> corpus the system was trained on. Accomplishing the very narrow
> parsing task in S&L's challenge hardly tells us anything about
> whether some system is or is not able to learn a natural language
> grammar, so if our goal is really studying how humans acquire
> grammars, the challenge is virtually irrelevant to that goal.

I would agree with this. The more appropriate goal seems to be to
show a language learner with a version of P&P/minimalist assumed
innate knowledge outperforming a language learner without that
knowledge on a grammar induction task. However, it doesn't seem
unreasonable to me to focus on constituency learning as the first such
task - it's one of the more basic and better understood areas of

On 11 May 2005, Charles Yang in Linguist 16.1505 wrote:
> The recent, and remarkable, work of Klein and Manning (2002)
> takes this a step further. So far as I can tell, in the induction of a
> grammatical constituent, Klein & Manning's model not only keeps
> track of the constituent itself, but also its aunts and sibling(s) in the
> tree structure. These additional structures is what they refer to
> as ''context''; those with a more traditional linguistics training may
> recall ''specifier'', ''complement'', ''c-command'', and ''government''.

I take this as the genuine misunderstanding, but this isn't right: while
we define "context" as a general notion, the "context" that we
concretely use is nothing more or less than the word class immediately
to the left and right of a putative constituent. This model was adopted
precisely because such a model of using word classes to the left and
right had proven so successful in distributional word class induction.



Dan Klein and Christopher D. Manning. 2002. A Generative
Constituent-Context Model for Improved Grammar Induction.
Proceedings of the 40th Annual Meeting of the Association for
Computational Linguistics, pp. 128-135.

Dan Klein and Christopher D. Manning. 2004. Corpus-Based Induction
of Syntactic Structure: Models of Dependency and Constituency.
Proceedings of the 42nd Annual Meeting of the Association for
Computational Linguistics (ACL 2004).

Dan Klein, "The Unsupervised Learning of Natural Language
Structure," Ph.D. Thesis, Stanford University, 2005.

Hinrich Schuetze. Distributional part-of-speech tagging. In EACL 7
(1995), pp. 141-148. http://arxiv.org/abs/cmp-lg/9503009

Alexander Clark (2003) Combining Distributional and Morphological
Information for Part of Speech Induction, Proceedings of EACL 2003.

Alexander Clark and Franck Thollard (2004) PAC-learnability of
Probabilistic Deterministic Finite State Automata Journal of Machine
Learning Research, 5 (May):473-497, 2004.

Linguistic Field(s): Computational Linguistics
Discipline of Linguistics
Language Acquisition
Linguistic Theories
Message 2: Re: A Challenge to the Minimalist Community
Date: 13-May-2005
From: Martha McGinnis <mcginnisucalgary.ca>
Subject: Re: A Challenge to the Minimalist Community

>Unfortunately though, a great many responses to our challenge
>from the P&P/MP community have involved various attempts to
>argue that our challenge is irrelevant to the goals of that
>version of theoretical syntax. This does not suggest to us
>that there is any widespread desire to play together.

On the contrary, it is Charles Yang and I who have noted the
desirability of collaboration, and Sproat and Lappin who have
sidestepped the suggestion. Most syntacticians who have posted
responses have made it clear that in practical terms it would be
impossible for us to undertake such a project without collaboration.

Remember, Sproat and Lappin began the discussion, not by offering
to play together, but by declaring that Minimalism/P&P can't be taken
seriously until it produces a large-scale trainable parser. It's not
surprising that many people's first reaction was to refute this
assertion. If the suggestion had been to play together, the response
would have been very different. And indeed, if there are any
computational linguists who are actually interested in pursuing Sproat
and Lappin's project (as Sproat and Lappin themselves apparently are
not), I imagine there are many Minimalist syntacticians who would be
willing to help out.


Dr. Martha McGinnis, Assistant Professor
Linguistics Department, University of Calgary

Linguistic Field(s): Computational Linguistics
Discipline of Linguistics
Linguistic Theories

Respond to list|Read more issues|LINGUIST home page|Top of issue

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.