LINGUIST List 21.4537
|
Thu Nov 11 2010
Disc: Term Frequency Weighting Choices
Editor for this issue: Elyssa Winzeler
<elyssa linguistlist.org>
|
To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.cfm.
|
Directory
1. Leslie Barrett ,
Term Frequency Weighting Choices
Message 1: Term Frequency Weighting Choices
|
Date: 09-Nov-2010
From: Leslie Barrett <lbarrett29 hotmail.com>
Subject: Term Frequency Weighting Choices
E-mail this message to a friend
I am trying to decide whether to use a square-root-based term-frequency weight or a variable weight based on the maximum term frequency in the document (log-based weighting won't work for me because it isn't sensitive enough to changes on the small end of the scale). I am using a corpus of non-thematic documents, highly variable in length but none exceeding 10K words. Has anyone either tried both on a similar corpus and has results they could share or else does anyone know of any research comparing the different weights on sample data? I would very much appreciate any advice. Will post answers if appropriate.
Linguistic Field(s):
Computational Linguistics
Read more issues|LINGUIST home page|Top of issue
|
|
Page Updated: 11-Nov-2010
|
|
About LINGUIST
|
Contact Us
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.
|
|