LINGUIST List 22.812|
Fri Feb 18 2011
Jobs: Computational Linguistics: Programmer, Tioga Lake Consulting
Editor for this issue: Erin Smith
The LINGUIST List strongly encourages employers to engage in non-discriminatory hiring practices. We urge employers not to discriminate on the grounds of race, ethnicity, nationality, disability, age, religion, gender, or sexual orientation. However, we have no means of enforcing these standards.
Job seekers should pay special attention to language in ads regarding employment requirements and are encouraged to consult our international employment page at http://linguistlist.org/jobs/jobnet.html. This page has been set up so that people can report on the employment standards of various countries.
To post to LINGUIST, use our convenient web form at http://linguistlist.org/posttolinguist.cfm
1. Brian Buchanan ,
Computational Linguistics: Programmer, Tioga Lake Consulting, LLC
Message 1: Computational Linguistics: Programmer, Tioga Lake Consulting, LLC
From: Brian Buchanan <brian.buchanangmail.com>
Subject: Computational Linguistics: Programmer, Tioga Lake Consulting, LLC
E-mail this message to a friend
University or Organization: Tioga Lake Consulting, LLC
Job Rank: Programmer
Specialty Areas: Computational Linguistics
Contract Developer - Web Content Crawler Project
Contract project -- 12 weeks minimum
Seeking a programmer or computational linguist to create heuristics for
extracting certain information from small/medium business websites, such as
business name, description, contact information, hours of operation,
restaurant menus, etc.
The web crawling / HTML parsing framework for this project is already in
place. The objective of this contract is the development and refinement of
the actual code for extracting the required data from each website and
converting it to a structured format.
To be considered for this contract, a candidate need not be an expert
programmer; however, basic programming ability and familiarity with
previous web crawling project involving data aggregation and already has an
intuitive sense about how to approach this problem.
-Review websites and create training data set.
-Program heuristics to extract specified data items from web pages and
perform aggregate analysis of websites.
-Load heuristics into website analyzer and run analyzer on the training
-Compare output of website analyzer with expected results from the training
-Identify problems with the heuristics and examine the affected websites to
determine the cause.
-Develop new possible heuristics for improving the accuracy of the website
-Program new heuristics and repeat the review process.
-Create algorithms for estimating the accuracy of each heuristic for a
-Excellent working knowledge of web technologies (HTML, etc.)
-Experience working with structured data & aggregation
-Background in statistics and/or machine learning (e.g. Bayesian filtering)
-Background in computational linguistics
-Working knowledge of Ruby, C++, or at least one other programming language
-Experience with UNIX command-line tools (e.g. using the command shell on
Linux or MacOS X)
Application Deadline: 28-Feb-2011
Email Address for Applications: brian.buchanangmail.com
Read more issues|LINGUIST home page|Top of issue
Page Updated: 18-Feb-2011
While the LINGUIST List makes every effort to ensure the linguistic relevance of sites listed
on its pages, it cannot vouch for their contents.