Editor for this issue: Naomi Fox <fox
linguistlist.org>
Re: Linguist 15.2577, Linguist 15.2594 Dear List, John Atkinson writes: > ... Google shows the first as 16 times less common than the second. > Of course, it's no use entering "take the liberty", because three quarters > of the returns are things like "Take the Liberty Bridge Exit". Also, a type > of automobile called a Liberty seems to turn up in a high proportion of the > hits on both sides. This is a very important point and bears elaboration. You might even say that this is the fundamental problem with simply googling linguistic queries. Google does not allow truly literal searches because it strips quite a bit of metainformation from queries. The information Google does account for belongs to two basic classes: lexical and syntactic data. The problem on the lexical side is that only literal lexemes are accounted for. Google has some support for synonymic searches using the ' ~ ' operator, but its weakness for our purposes is that its morphological features are currently limited to accounting for plural variation. Verb morphology is not accounted for xat all. Searches for "took a liberty" and "takes a liberty", along with their the-equivalents, return similar results to "take a liberty", but seemingly with less punctuation noise. (In engineering terms, the internet already has a high signal-to-noise ratio. When Google strips the signal, the query, of metainformation, it increases this ratio even further. Google is simply not designed to handle linguistic queries.) Services like the University of Liverpool's WebCorp (http://www.webcorp.org.uk , but refusing connections at the moment) try to eliminate the static by using Google to do an initial search, and then filtering those results using verb morphology, capitalization, and punctuation. > Perhaps the preponderance of "take the liberty to" in web-pages is because > it's common in officialese, while "take a liberty" is a rather more literary > term. Nothing to do with their relative well-formedness. Yes, I think it's safe to say that Google searches are nice frequency of use indicators, nothing more. It is very difficult to define a threshold above which things are grammatical. Warm Regards, Damon - Damon Allen Davison http://www.allolex.netMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issue
In Linguist 15.2577, Danko Sipka <danko.sipkaMail to author|Respond to list|Read more issues|LINGUIST home page|Top of issueasu.edu> wrote: > Dear Linguists, > > I frequently use Google to determine lexical and morphosyntactic > well-formedness of two options in various languages. I advise my > students to do the same. In order to save time required to go to > Google two times for one inquiry, I have created a simple script at: > > http://cli.la.asu.edu/togoogleornot.htm > > which lets you enter two options, choose the target language and then > get hits for both options in one window. For example, if a student of > English enters take the liberty as the first option and take a liberty > as the second, it will be possible to determine that the first option > is well-formed while the other is not. I queried the two constructions and the Google results show that 'take the liberty' is much more frequent in web pages than 'take a liberty'. However, I don't think that low frequency entails that a construction is ill-formed or unacceptable. Also, looking at only one form of the lemma TAKE (the form 'take') may hide some interesting variations in the relative frequency of the combinations with 'the liberty' and 'a liberty'. Google queries of the different forms of TAKE returned the following results: take the/a liberty: 39,300 / 1,520 takes the/a liberty: 3,720 / 313 taking the/a liberty: 9,770 / 810 took the/a liberty: 50,900 / 687 taken the/a liberty: 56,300 / 680 Although the 'the' construction is more frequent for all forms of TAKE, the frequency difference is much less marked in the cases of 'takes' and 'taking' (both about 12 times more frequent) than in 'take' (x26), 'took' (x74), and 'taken' (x83). I also queried 'TAKE the liberty' and 'TAKE a liberty' in BNCweb: take the/a liberty: 6 / -- takes the/a liberty: -- / -- taking the/a liberty: -- / 1 took the/a liberty: 13 / 1 taken the/a liberty: 10 / -- There don't seem to be enough instances of the two constructions to draw any conclusions. However, there are three interesting points to consider. First, the 'a' construction is found in a representative corpus, which indicates that it is acceptable. Second, although in the Google query the 'the' construction is about 40 times more frequent (considering all the forms of TAKE), in the BNC it is only 15 times more frequent, which points towards exercising caution when using the web as a corpus - as John Atkinson has already mentioned (Linguist 15.2594). Finally, in the BNC, one of the two instances of the 'a' construction is with the form 'taking', although there are no instances of 'taking' in the 'the' construction. Perhaps Google queries can be more useful in helping learners become aware of the different contexts that the different forms of two constructions are used in. Costas Gabrielatos