LINGUIST List 3.304

Mon 30 Mar 1992

Disc: LFG

Editor for this issue: <>


Directory

  1. , LFG

Message 1: LFG

Date: Tue, 24 Mar 1992 17:16:18LFG
From: <Ronald_Kaplan.PARCxerox.com>
Subject: LFG

In Linguist 3.128 (the beginning of February) Bruce Litow asks:

"I would like to know if LFG formalism is still being used. The
result of Berwick-Nishino that LFG grammaticality is NP-complete
is not so negative as it might first appear. Any information on LFG
research would be appreciated."

This is a somewhat delayed and somewhat lengthy response to that short
question.

I think that LFG is alive and well in various places around the world, where it
is being used in theoretical, descriptive, formal, and computational
investigations. As you might expect, there is quite a bit of activity here at
Xerox and Stanford, and we can send collections of recent publications to any
requestors. I know of LFG-oriented projects elsewhere in the U.S. and in
Europe, Asia, and Australia, but it is perhaps not appropriate for me to speak
for those researchers.

Note also that theoretical concepts know no boundaries, and many of the notions
and mechanisms of LFG have been incorporated into a number of other frameworks
that don't carry the LFG label (for example, the use of hierarchical
attribute-value matrices, now quite widespread, first emerged in the early LFG
tradition). In the computational arena, LFG principles and mechanisms are
often cited to explicate the intended effects of various implementation
notations and techniques.

Berwick, Peters, and recently Nishino have shown that the LFG parsing problem
is NP-complete. For non-computerists, this means that it can be solved on a
(N)ondeterministic Turing machine in (P)olynomial time, and it is no harder
than a large number of other such problems (Boolean satisfiability, traveling
salesman...). In practical terms, this means that there is no known algorithm
for solving the worst cases of this problem on a (d)eterministic Turing machine
(e.g. a real computer) in less than exponential time--for example, an 11 word
sentence could take twice as long to parse as a 10 word sentence, and a 12 word
sentence could double the time again. Typical algorithms solve these problems
according to a brute-force generate-and-test strategy--there are exponentially
many candidate solutions to test, but each one can be tested relatively quickly
(in polynomial time). Such problems are generally regarded as being
intractable; certainly one would never design a programming language whose
parsing problem was exponential.

But I don't think the NP-completeness result is necessarily negative for LFG as
a formalism for natural language grammars, and I don't think this result has
had much impact one way or the other on who does or does not work within the
LFG framework. This is a worst-case result for the formalism: it is possible
to construct artificial grammars within the formalism that would require this
kind of exponential brute-force backtracking search. But these worst-case
examples involve ambiguities or disjunctive alternations that can only be
resolved over arbitrarily long stretches of the input string. Although such
interactions may be possible in natural languages, they seem not to be common
or typical (or "natural"), so that the average-case complexity may turn out to
be far more reasonable.

Indeed, research over the last few years within the LFG framework has led to
methods for solving disjunctive feature systems that are tuned for what do seem
to be typical as opposed to worst-case situations. Presumably, more detailed
complexity results and the reduction methods by which they are proved will
serve as important tools for understanding the worst-case conspiracies of
information dependencies and will lead to further refinements in natural
language processing strategies. They will help in discovering how to
structure algorithms whose typical, average-case behavior for natural grammars
and sentences more closely approximates our pretheoretic expectations of
relative processing complexity.

The fact that an NP-completeness result exists for LFG may actually be
considered a point in LFG's favor. Many other frameworks are so poorly
formalized, unstable, and chaotic that the question of computational complexity
can't even be approached in a meaningful way (GB stands out in this regard,
although some adherents might debate this point). And other theories (most of
the "unification-based" theories--HPSG, FUG, categorial variants of these) are
provably undecidable--which is worse than NP-complete, because in the worst
case they may not terminate at all.

And there may also be some psycholinguistic value in having a formalism in the
NP-complete class. As indicated, we can naturally partition the parsing
computation into a "generate-candidates" component and a "test-candidates"
component. This nicely supports a psychological processing model in which the
generation process is governed by loosely operating perceptual strategies and
heuristics. That component need not be terribly concerned about guessing the
"right" candidate, as in proposals for "deterministic grammars", since the cost
of testing (verifying consistency) a wrong guess is (P)olynomially bounded and
alternative candidates are still heuristically available.

Finally, there is an argument (due to Barton, Berwick, and Ristad) that any
formalism powerful enough to properly characterize agreement and lexical
ambiguity in natural language will necessarily be NP-hard, that is, at least as
hard as all the NP-complete problems. Berwick seems to have backed away from
his earlier suggestion that NP-complete formalisms were somehow not suitable or
desirable for natural language grammars.

Thus, the LFG formalism may represent a good compromise between the conflicting
requirements of efficient parsability (either for practical purposes or for
psychological modeling) and descriptive power (for succintly expressing
linguistic generalizations).

At another level, the NP-completeness result hasn't affected who does or does
not use the formalism because in the current state of the discipline, whether
or not researchers adhere to any particular theory or framework often seems to
be a function of linguistic politics and sociology and accidents of personal
history or geography, rather than being based on deep and accurate comparisons
between approaches. (And for computational projects, source-code availability
can sometimes be the deciding factor--many hotly debated issues in theoretical
circles are distinctions without a difference from a computational point of
view). Some day issues of computational complexity may (and, with deep
understanding, should) figure significantly in accepting or rejecting a
framework, but that day is still a ways off.
Mail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue