Discussion Details
| Title: | Re: 15.2594, Disc: Re: 15.2577, FYI:Using Google Script |
| Submitter: | Damon Allen Davison |
| Description: | Re: Linguist 15.2577, Linguist 15.2594 Dear List, John Atkinson writes: ... Google shows the first as 16 times less common than the second. Of course, it's no use entering 'take the liberty', because three quarters of the returns are things like 'Take the Liberty Bridge Exit'. Also, a type of automobile called a Liberty seems to turn up in a high proportion of the hits on both sides. This is a very important point and bears elaboration. You might even say that this is the fundamental problem with simply googling linguistic queries. Google does not allow truly literal searches because it strips quite a bit of metainformation from queries. The information Google does account for belongs to two basic classes: lexical and syntactic data. The problem on the lexical side is that only literal lexemes are accounted for. Google has some support for synonymic searches using the ' ~' operator, but its weakness for our purposes is that its morphological features are currently limited to accounting for plural variation. Verb morphology is not accounted for xat all. Searches for 'took a liberty' and 'takes a liberty', along with their the-equivalents, return similar results to 'take a liberty', but seemingly with less punctuation noise. (In engineering terms, the internet already has a high signal-to-noise ratio. When Google strips the signal, the query, of metainformation, it increases this ratio even further. Google is simply not designed to handle linguistic queries.) Services like the University of Liverpool's WebCorp (http://www.webcorp.org.uk , but refusing connections at the moment) try to eliminate the static by using Google to do an initial search, and then filtering those results using verb morphology, capitalization, and punctuation. Perhaps the preponderance of 'take the liberty to' in web-pages is because it's common in officialese, while 'take a liberty' is a rather more literary term. Nothing to do with their relative well-formedness. Yes, I think it's safe to say that Google searches are nice frequency of use indicators, nothing more. It is very difficult to define a threshold above which things are grammatical. Warm Regards, Damon Damon Allen Davison http://www.allolex.net |
| Date Posted: | 20-Sep-2004 |
| LL Issue: | 15.2606 |
| Posted: | 20-Sep-2004 |

