A week or two ago, I sent the following query to the folks on Linguist:

> My interest is in identifying nonreferential NP's in written English so
> that a computer natural lg processing system would know not to set up
> referents for them to serve as antecedents for subsequent anaphora
> resolution. Compounds (e.g. duck-shooting season) can I suppose be
> treated superficially as single words, but what about things like 'lose
> faith in', 'catch sight of'. Of course, some criteria will involve
> larger discourse issues, but it may be that it is possible to identify
> at least some nonreferential NP's "cheaply", i.e. just by looking
> within a clause and/or considering inherent lexical semantics.

My interest is also in examples like 'John is a teacher', where 'a
teacher' is not used to introduce a new discourse referent, but rather
to characterize one which has already been introduced.

My thanks to Gregory Ward, Marion Kee, and Louise McNally for their
replies and comments.

To summarize my investigations to date and the comments of these people:

Ward, Gregory, Richard Sproat, and Gail McKoon. 1991 ``A Pragmatic
Analysis of So-Called Anaphoric Islands,'' in Language 67:439-474.
-- contrary to what I was assuming about compounds, subparts may serve
as antecedents for pronominal anaphora.

Louise McNally noted:
'First of all, it appears that languages with article systems (like
English and other Germanic lgs., and the Romance lgs.) mark
"nonreferentiality" by the absence of an article (i.e. via bare
singular or, more commonly, bare plural NPs). Although such NPs may
function syntactically just like NPs with articles (in contrast to
incorporated nominals), semantically and pragmatically they are quite
distinct... In English things are complicated by the fact that bare
plurals also appear to denote natural kinds, but this appears to be the
[exception rather] than the rule (bare nominals in the other languages
I've looked at do not have this interpretation).

Thus, what is shared by incorporated and non-incorporated,
nonreferential NPs is thus (1) the absence of an article; and (2) their
semantics/pragmatics. -- I would treat bare nominals as property
denoting (alternatively, as contributing only descriptive content),
whereas I treat NPs with articles as entity denoting (alternatively, as
contributing both descriptive content and, crucially, a discourse
referent). Quantificational NPs are another matter altogether.

There appear to be differences between the discourse anaphoric
properties of nouns in compounds (like "baby-sitter") and bare plurals
that occur as independent elements in sentences. Specifically, bare
nominals are much more likely to felicitously license discourse
anaphora to "token" entities (as opposed to kinds) than are nouns
inside compounds. I have not investigated this in detail, but I
suspect that the differences involve the sorts of existential
inferences you get via the use of the compounds vs. full sentences (for
example, I can truthfully describe someone as a "tomato grower" at time
_t_ without there being any tomatoes that the person is growing at _t_;
in contrast, if it is true that Fred grew tomatoes at _t_, there must
have been tomatoes at _t_ that he grew. -- this pair doesn't do justice
to the complexity of the problem, but I hope it gives you an idea of
the differences one finds between the conditions on the applicability
of nouns as descriptions and the truth of sentences.)'

Marion Kee suggested marking phrasal verbs in the lexicon, for example
'catch sight of', where 'sight' is non-referential. While this is an
eminently practical solution to a thorny problem, my mandate is to
explore methods for automatically identifying such non-referential
uses, this being (presumably) more general, and computationally less
expensive than searching the lexicon for given collocations.

I am still musing on structural cues which might be used, perhaps in
combination with semantic information.

Finally, I have attached a brief summary of the 'backgrounded object
construction' in Roviana, a W.Oceanic lg, spoken in the Solomon
Islands. I am currently working on a sketch gr for an upcoming volume
on Oceanic lgs.

The construction I am referring to would be called an antipassive by
some (Roviana has morpho-syntactic ergativity with really unusual splits),

'I cooked the taro' comes out as a transitive, with 'I' having
ergative 'marking' (actually, zero for ergative, which is one of the
unusual things), and there is transitive morphology on the verb.
 Constituent order VAO

'I cooked taro' / 'I did taro-cooking' comes out as intransitive, with
'I' marked as absolutive, and constituent order VOA. No transitive
morphology on the verb, and you could say that the O has 'moved into
the verb phrase' if you were inclined to use such dynamic metaphors,
and could make a case for what a verb phrase was in Roviana.
I call this the 'backgrounded object construction'. It is used in
subordinate clauses, which do not have morpho-syntactic ergativity, and
it doesn't involve the marking of A as oblique, so I am not prepared to
call it an antipassive.

Now: You use the backgrounded object construction (i) (optionally) for
pragmatically backgrounding the undergoer in discourse (coz it is not
important) or (ii) (obligatorily) if the undergoer is non-specific.

By non-specific I mean that the speaker doesn't have a particular ref
in mind, even if one might be said to exist, e.g.
'I did taro-cooking' implies taro exists, but you are not focusing on
any particular taro.

Exception: If you are asserting the non-existence of an undergoer by
using a prenominal modifier 'none/nothing', you can use the normal
transitive construction.
'I didn't kill ANYONE' (there does not exist a person such that I
killed them) = transitive
'I didn't kill anyone' (denying the action, not asserting the
non-existence of the referent) = backgrounded object construction.

Thus: the transitive construction is used if the undergoer is (a)
asserted to not exist or (b) specific and not pragmatically backgrounded.
The backgrounded obj construction is used if the undergoer is (b)
non-specific and not being asserted to not exist or (b) specific and
pragmatically backgrounded.

Roviana has articles that mark information statuses like definite. The
NP in a backgrounded obj construction however can only be a bare noun.

My thanks again to those who replied. Any further thoughts/comments
much appreciated.

Simon Corston
