LINGUIST List 16.2137|
Tue Jul 12 2005
Qs: Lexical Bundles; German Wordlist with Hyphenation
Editor for this issue: Jessica Boynton
We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate. In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query. To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
German Wordlist with Hyphenation - New Spelling
Message 1: Lexical Bundles
From: Jennifer Eagleton <jennyasian-emphasis.com>
Subject: Lexical Bundles
Editor's note: Apologies for the delay in posting.
I notice that all of the studies I have read on this topic have focussed on 4
word bundles and that you they have all used what I would call large corpora
i.e. many millions of words. The rationale seems to be that with 5 word bundles
you do not get enough to analyse and that with three word bundles there are
probably too many to handle.
I want to do a study of bundles on a specific corpus I have, but which only has
600,000 words. To be able to work with large numbers of bundles, it would
therefore make sense to focus on 3 word bundles. I could do a study on 4 word
bundles, but the sample would be smaller.
So my question is, do people see any disadvantages on focusing on 3-word bundles
and, if so, what they might be?
Looking forward to hearing your responses.
ON BEHALF OF PROF. JOHN FLOWERDEW
DEPARTMENT OF ENGLISH AND COMMUNICATION
CITY UNIVERSITY OF HONG KONG
Linguistic Field(s): Text/Corpus Linguistics
Message 2: German Wordlist with Hyphenation - New Spelling
From: Gregor Sieber <gregor.sieberstudent.uni-tuebingen.de>
Subject: German Wordlist with Hyphenation - New Spelling
I am a BA student in computational linguistics at the university of
Tübingen. For my BA thesis I am working on finite state patterns for German
following the work of Gosse Bouma (for Dutch). I want to use machine
learning algorithms to improve the results of the FS approach. For this
reason I am look for a word list in the new German orthography that
contains hyphenation points and could be used as training data for the
algorithm (TBL). The CELEX list, which would have been a god resource, is
still in the old orthography.
Thank you in advance for any hints about such a wordlist.
Linguistic Field(s): Computational Linguistics
Respond to list|Read more issues|LINGUIST home page|Top of issue
Please report any bad links or misclassified data
LINGUIST Homepage | Read
LINGUIST | Contact us
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.