|Title:||Re: 15.2594, Disc: Re: 15.2577, FYI:Using Google Script|
|Submitter:||Damon Allen Davison|
|Description:||Re: Linguist 15.2577, Linguist 15.2594
John Atkinson writes:
... Google shows the first as 16 times less common than the second.
Of course, it's no use entering 'take the liberty', because three quarters
of the returns are things like 'Take the Liberty Bridge Exit'. Also, a type
of automobile called a Liberty seems to turn up in a high proportion of the
hits on both sides.
This is a very important point and bears elaboration. You might even
say that this is the fundamental problem with simply googling
linguistic queries. Google does not allow truly literal searches
because it strips quite a bit of metainformation from queries. The
information Google does account for belongs to two basic classes:
lexical and syntactic data. The problem on the lexical side is that
only literal lexemes are accounted for. Google has some support for
synonymic searches using the ' ~' operator, but its weakness for our
purposes is that its morphological features are currently limited to
accounting for plural variation. Verb morphology is not accounted for
xat all. Searches for 'took a liberty' and 'takes a liberty', along
with their the-equivalents, return similar results to 'take a
liberty', but seemingly with less punctuation noise.
(In engineering terms, the internet already has a high signal-to-noise
ratio. When Google strips the signal, the query, of metainformation,
it increases this ratio even further. Google is simply not designed
to handle linguistic queries.)
Services like the University of Liverpool's WebCorp
(http://www.webcorp.org.uk , but refusing connections at the moment)
try to eliminate the static by using Google to do an initial search,
and then filtering those results using verb morphology,
capitalization, and punctuation.
Perhaps the preponderance of 'take the liberty to' in web-pages is because
it's common in officialese, while 'take a liberty' is a rather more literary
term. Nothing to do with their relative well-formedness.
Yes, I think it's safe to say that Google searches are nice frequency
of use indicators, nothing more. It is very difficult to define a
threshold above which things are grammatical.
Damon Allen Davison