* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 21.4537

Thu Nov 11 2010

Disc: Term Frequency Weighting Choices

Editor for this issue: Elyssa Winzeler <elyssalinguistlist.org>


To post to LINGUIST, use our convenient web form at http://linguistlist.org/LL/posttolinguist.cfm.
Directory
        1.     Leslie Barrett , Term Frequency Weighting Choices

Message 1: Term Frequency Weighting Choices
Date: 09-Nov-2010
From: Leslie Barrett <lbarrett29hotmail.com>
Subject: Term Frequency Weighting Choices
E-mail this message to a friend

I am trying to decide whether to use a square-root-based term-frequency
weight or a variable weight based on the maximum term frequency in the
document (log-based weighting won't work for me because it isn't sensitive
enough to changes on the small end of the scale). I am using a corpus of
non-thematic documents, highly variable in length but none exceeding 10K
words. Has anyone either tried both on a similar corpus and has results
they could share or else does anyone know of any research comparing the
different weights on sample data? I would very much appreciate any advice.
Will post answers if appropriate.


Linguistic Field(s): Computational Linguistics




Read more issues|LINGUIST home page|Top of issue



Page Updated: 11-Nov-2010

Supported in part by the National Science Foundation       About LINGUIST    |   Contact Us       ILIT Logo
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.