Publishing Partner: Cambridge University Press CUP Extra Publisher Login

Discussion Details

Title: Re: 15.2594, Disc: Re: 15.2577, FYI:Using Google Script
Submitter: Damon Allen Davison
Description: Re: Linguist 15.2577, Linguist 15.2594
Dear List,
John Atkinson writes:
  ... Google shows the first as 16 times less common than the second.
  Of course, it's no use entering 'take the liberty', because three quarters
  of the returns are things like 'Take the Liberty Bridge Exit'. Also, a type
  of automobile called a Liberty seems to turn up in a high proportion of the
  hits on both sides.
This is a very important point and bears elaboration. You might even
say that this is the fundamental problem with simply googling
linguistic queries. Google does not allow truly literal searches
because it strips quite a bit of metainformation from queries. The
information Google does account for belongs to two basic classes:
lexical and syntactic data. The problem on the lexical side is that
only literal lexemes are accounted for. Google has some support for
synonymic searches using the ' ~' operator, but its weakness for our
purposes is that its morphological features are currently limited to
accounting for plural variation. Verb morphology is not accounted for
xat all. Searches for 'took a liberty' and 'takes a liberty', along
with their the-equivalents, return similar results to 'take a
liberty', but seemingly with less punctuation noise.
(In engineering terms, the internet already has a high signal-to-noise
ratio. When Google strips the signal, the query, of metainformation,
it increases this ratio even further. Google is simply not designed
to handle linguistic queries.)
Services like the University of Liverpool's WebCorp
( , but refusing connections at the moment)
try to eliminate the static by using Google to do an initial search,
and then filtering those results using verb morphology,
capitalization, and punctuation.
  Perhaps the preponderance of 'take the liberty to' in web-pages is because
  it's common in officialese, while 'take a liberty' is a rather more literary
  term. Nothing to do with their relative well-formedness.
Yes, I think it's safe to say that Google searches are nice frequency
of use indicators, nothing more. It is very difficult to define a
threshold above which things are grammatical.
Warm Regards,
Damon Allen Davison
Date Posted: 20-Sep-2004
LL Issue: 15.2606
Posted: 20-Sep-2004

Search Again

Back to Discussions Index