* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 16.2137

Tue Jul 12 2005

Qs: Lexical Bundles; German Wordlist with Hyphenation

Editor for this issue: Jessica Boynton <jessicalinguistlist.org>

We'd like to remind readers that the responses to queries are usually best posted to the individual asking the question. That individual is then strongly encouraged to post a summary to the list. This policy was instituted to help control the huge volume of mail on LINGUIST; so we would appreciate your cooperating with it whenever it seems appropriate. In addition to posting a summary, we'd like to remind people that it is usually a good idea to personally thank those individuals who have taken the trouble to respond to the query. To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.html.
        1.    Jennifer Eagleton, Lexical Bundles
        2.    Gregor Sieber, German Wordlist with Hyphenation - New Spelling

Message 1: Lexical Bundles
Date: 12-Jul-2005
From: Jennifer Eagleton <jennyasian-emphasis.com>
Subject: Lexical Bundles

Editor's note: Apologies for the delay in posting.

I notice that all of the studies I have read on this topic have focussed on 4
word bundles and that you they have all used what I would call large corpora
i.e. many millions of words. The rationale seems to be that with 5 word bundles
you do not get enough to analyse and that with three word bundles there are
probably too many to handle.

I want to do a study of bundles on a specific corpus I have, but which only has
600,000 words. To be able to work with large numbers of bundles, it would
therefore make sense to focus on 3 word bundles. I could do a study on 4 word
bundles, but the sample would be smaller.

So my question is, do people see any disadvantages on focusing on 3-word bundles
and, if so, what they might be?

Looking forward to hearing your responses.


Linguistic Field(s): Text/Corpus Linguistics
Message 2: German Wordlist with Hyphenation - New Spelling
Date: 11-Jul-2005
From: Gregor Sieber <gregor.sieberstudent.uni-tuebingen.de>
Subject: German Wordlist with Hyphenation - New Spelling

I am a BA student in computational linguistics at the university of
Tübingen. For my BA thesis I am working on finite state patterns for German
following the work of Gosse Bouma (for Dutch). I want to use machine
learning algorithms to improve the results of the FS approach. For this
reason I am look for a word list in the new German orthography that
contains hyphenation points and could be used as training data for the
algorithm (TBL). The CELEX list, which would have been a god resource, is
still in the old orthography.

Thank you in advance for any hints about such a wordlist.

Best regards

Gregor Sieber

Linguistic Field(s): Computational Linguistics

Respond to list|Read more issues|LINGUIST home page|Top of issue

Please report any bad links or misclassified data

LINGUIST Homepage | Read LINGUIST | Contact us

NSF Logo

While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.