* * * * * * * * * * * * * * * * * * * * * * * *
LINGUIST List logo Eastern Michigan University Wayne State University *
* People & Organizations * Jobs * Calls & Conferences * Publications * Language Resources * Text & Computer Tools * Teaching & Learning * Mailing Lists * Search *
* *
LINGUIST List 22.812

Fri Feb 18 2011

Jobs: Computational Linguistics: Programmer, Tioga Lake Consulting

Editor for this issue: Erin Smith <erinlinguistlist.org>

The LINGUIST List strongly encourages employers to engage in non-discriminatory hiring practices. We urge employers not to discriminate on the grounds of race, ethnicity, nationality, disability, age, religion, gender, or sexual orientation. However, we have no means of enforcing these standards.

Job seekers should pay special attention to language in ads regarding employment requirements and are encouraged to consult our international employment page at http://linguistlist.org/jobs/jobnet.html. This page has been set up so that people can report on the employment standards of various countries.

To post to LINGUIST, use our convenient web form at http://linguistlist.org/posttolinguist.cfm
        1.     Brian Buchanan , Computational Linguistics: Programmer, Tioga Lake Consulting, LLC

Message 1: Computational Linguistics: Programmer, Tioga Lake Consulting, LLC
Date: 15-Feb-2011
From: Brian Buchanan <brian.buchanangmail.com>
Subject: Computational Linguistics: Programmer, Tioga Lake Consulting, LLC
E-mail this message to a friend

University or Organization: Tioga Lake Consulting, LLC
Job Rank: Programmer

Specialty Areas: Computational Linguistics


Contract Developer - Web Content Crawler Project

Contract project -- 12 weeks minimum

Seeking a programmer or computational linguist to create heuristics for
extracting certain information from small/medium business websites, such as
business name, description, contact information, hours of operation,
restaurant menus, etc.

The web crawling / HTML parsing framework for this project is already in
place. The objective of this contract is the development and refinement of
the actual code for extracting the required data from each website and
converting it to a structured format.

To be considered for this contract, a candidate need not be an expert
programmer; however, basic programming ability and familiarity with
Javascript or Ruby is required. The ideal candidate has worked on a
previous web crawling project involving data aggregation and already has an
intuitive sense about how to approach this problem.

Job description:
-Review websites and create training data set.
-Program heuristics to extract specified data items from web pages and
perform aggregate analysis of websites.
-Load heuristics into website analyzer and run analyzer on the training
data set.
-Compare output of website analyzer with expected results from the training
-Identify problems with the heuristics and examine the affected websites to
determine the cause.
-Develop new possible heuristics for improving the accuracy of the website
-Program new heuristics and repeat the review process.
-Create algorithms for estimating the accuracy of each heuristic for a
given webpage.

Required qualifications:
-Working competence with the Javascript programming language
-Excellent working knowledge of web technologies (HTML, etc.)
-Experience working with structured data & aggregation

Desired qualifications:
-Background in statistics and/or machine learning (e.g. Bayesian filtering)
-Background in computational linguistics
-Working knowledge of Ruby, C++, or at least one other programming language
-Experience with UNIX command-line tools (e.g. using the command shell on
Linux or MacOS X)

Application Deadline: 28-Feb-2011

Email Address for Applications: brian.buchanangmail.com
Contact Information:
Brian Buchanan
Email: brian.buchanangmail.com

Read more issues|LINGUIST home page|Top of issue

Page Updated: 18-Feb-2011

Supported in part by the National Science Foundation       About LINGUIST    |   Contact Us       ILIT Logo
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed on its pages, it cannot vouch for their contents.