LINGUIST List 25.2125

Tue May 13 2014

Software: Computational Linguistics, Syntax: SentiLecto

Editor for this issue: Andrew Lamont <alamontlinguistlist.org>


Date: 13-May-2014
From: Fernando Balbachan <fernando_balbachanyahoo.com.ar>
Subject: Computational Linguistics, Syntax: SentiLecto
E-mail this message to a friend

Spanish is particularly challenging for syntax analysis, as it is a free-order constituency language. Full parsing in Spanish is very tough, as it is not obvious where to find high-level syntax functions as subject, direct object, etc. Moreover, Spanish deals with a morphologically rich system in pronouns, agreements, etc., which makes the task harder.

SentiLecto is our Spanish Sentiment Analysis solution at Tecnolecto http://tecnolecto.com/sentilecto
This solution yields a highly fine-grained representation for the entities involved in each opinion. Unlike other approaches, this solution can deal with polarity shifting in the same sentence ('I like chocolate but I hate strawberry ice-cream') or even within embedded clauses ('Norwegians, who are an aggressive people, export the exquisite herring'). SentiLecto better represents the assumptions whereby the entities involved in the opinion are syntactically mapped onto SVO (subject-verb-object) slots for their sentiment assignments: 'Mary hates John' (2 entities but only the object has a negative presentation) vs. 'Mary harasses John' (the same 2 entities but only the subject has negative presentation).

SentiLecto leans on outstanding linguistic features such as: passive/active voice transformation, anaphora resolution and co-reference chains, modality treatment and accurate verb scripts for all verbs in Spanish, even with different pronominal cases (for example, 'destacar' 'to appraise something' vs. 'destacarSE' 'to highlight oneself from the rest')

Try it out and behold the results! http://tecnolecto.com/sentilecto

Fernando Balbachan
Grupo Tecnolecto
Senior Computational Linguist
fernando_balbachanyahoo.com.ar

Disclaimer: So far, we have released the syntax analysis, including passive/active voice transformation, clause extraction, anaphora resolution but we are currently working on modality and sentiment analysis (proof of concept in http://tecnolecto.com/sentitext). Also, we will develop a fact extraction and checking module (through DBpedia) and adapt this approach for English in a near future.

Linguistic Field(s): Computational Linguistics
                            Syntax

Subject Language(s): Spanish (spa)

Page Updated: 13-May-2014