Featured Linguist!

Jost Gippert: Our Featured Linguist!

"Buenos dias", "buenas noches" -- this was the first words in a foreign language I heard in my life, as a three-year old boy growing up in developing post-war Western Germany, where the first gastarbeiters had arrived from Spain. Fascinated by the strange sounds, I tried to get to know some more languages, the only opportunity being TV courses of English and French -- there was no foreign language education for pre-teen school children in Germany yet in those days. Read more



Donate Now | Visit the Fund Drive Homepage

Amount Raised:

$34513

Still Needed:

$40487

Can anyone overtake Syntax in the Subfield Challenge ?

Grad School Challenge Leader: University of Washington


Publishing Partner: Cambridge University Press CUP Extra Publisher Login
amazon logo
More Info


New from Oxford University Press!

ad

What is English? And Why Should We Care?

By: Tim William Machan

To find some answers Tim Machan explores the language's present and past, and looks ahead to its futures among the one and a half billion people who speak it. His search is fascinating and important, for definitions of English have influenced education and law in many countries and helped shape the identities of those who live in them.


New from Cambridge University Press!

ad

Medical Writing in Early Modern English

Edited by Irma Taavitsainen and Paivi Pahta

This volume provides a new perspective on the evolution of the special language of medicine, based on the electronic corpus of Early Modern English Medical Texts, containing over two million words of medical writing from 1500 to 1700.


Email this page
E-mail this page

Review of  Visualizing Document Processing


Reviewer: Baden Hughes
Book Title: Visualizing Document Processing
Book Author: Graziella Tonfoni Lakhmi Jain
Publisher: De Gruyter Mouton
Linguistic Field(s): Applied Linguistics
Text/Corpus Linguistics
Book Announcement: 16.1458

Discuss this Review
Help on Posting
Review:


Date: Fri, 6 May 2005 09:15:56 +1000 (EST)
From: Baden Hughes <badenh@cs.mu.OZ.AU>
Subject: Visualizing Document Processing

AUTHOR: Tonfoni, Graziella; Jain, Lakhmi
TITLE: Visualizing Document Processing
SUBTITLE: Innovations in Communication Patterns and Textual Forms
SERIES: Text, Translation Computational Processing 6
PUBLISHER: Mouton de Gruyter
YEAR: 2004

Baden Hughes, Department of Computer Science and Software Engineering,
University of Melbourne

SUMMARY DESCRIPTION

This book adopts as its core the idea that text processing, either
cognitive or computational, is in fact the linguistic realisation of more
abstract information management and processing. By its own admission, this
volume is intended for a specific audience, namely information specialists
whose interests are in the area of document reception and production,
where accuracy and reliability are a crucial factor in information
analysis. This book is likely to be of interest to researchers in the
areas of text linguistics, semiotics, document processing, dialogue
modelling, pragmatics, natural language generation and cognitive science.

In the first chapter, the theoretical background of a new approach towards
language and information is presented - new terminology introduced and
concepts defined. An polysynthetic, interdisciplinary approach is used in
defining both scientific and linguistic paradigms through which
interpretation can be carried out. The main paradigms incorporating
intelligence in machines (knowledge-based systems, artificial neural
networks, evolutionary computing, fuzzy logic and artificial agents) are
presented.

In the second chapter, the implications resulting from rethinking of
language and text through illustration both of the theory of text
comprehension and text compression form the body of the work.

The third chapter illustrates in greater detail this new perspective on
communication by familiarizing the reader with a complex visual system for
interpretation of qualitatively different components in natural language,
particularly textual documents. In the analysis of the physical
manifestation, a stratified observation framework is adopted, allowing
focus on different aspects of interpretation at both the macroscopic and
microscopic levels. New (cognitive) tools for observing, describing and
explaining qualitatively different phenomena in natural language are
discussed.

The fourth chapter illustrates further evolution of the model, in
particular its final output: CTML, a systematic but informal markup
language used for strategic document annotation. This markup language and
its corresponding document model, represent the climax of the research.

Concluding, the fifth chapter contains theoretical reflections about the
requirements for metaphor creation in modern information science, together
with practical suggestions for verifying and augmenting the consistency
and relevance of analogical reasoning.

The main line of argument throughout the volume is as follows. The focus
is on the representation of text procedures in terms of definitions and
their visualizations (Chapter 3). The understanding of these constructions
is prepared by the introduction of the conceptual system and the
discussion of the various scientific paradigms which have an impact on
them (Chapter 1) and the development of a particular theory of language
and text (Chapter 2). The representation of the system itself is followed
by a discussion of the application of the visual system as a kind of
markup language (annotation language for documents) and its role in the
work of information analysts (Chapter 4). The theoretical framework which
starts with Chapter 1 and 2 is completed by a final chapter in which the
crucial role of metaphors and analogies for scientific exploration
receives additional emphasis (Chapter 5).

CRITICAL EVALUATION

Much of the motivation for this book appears to have been drawn from
previous research by Tonfoni, who developed the CPP-TRS theory of text
comprehension in the early 1990s.

For researchers who seek a computationally tractable representation with
formal grounding, this text will be found wanting despite the apparently
short distance between the abstractions discussed and such a mode. At a
theoretical level, the "machines" which form the framework for Chapter 3
could equally well be expressed in alternative formalisms - graph theory,
finite state machines being two which come to mind but which are not even
mentioned in passing in the text. At a practical level, the obvious
affinities with hypertext theory are never explored, yet are immediately
apparent when discussing internal linkage and annotations within a
document interpretation instance.

The components of the document annotation language (CTML) appear to be
disengaged from naturally aligned theories which will be familiar to
linguists (for example Rhetorical Structure Theory). While this in itself
is not a major shortcoming, it does contribute to the overall feeling that
this work is sufficiently removed from the core of the linguistics
discipline that its contribution may not be as great as its potential.

At its core, the book proposes CTML, a document markup language.
Disappointingly, CTML is presented as little more than a series of
character based annotations and is not reduced to a computationally
tractable representation, despite this apparently being quite trivial.

At an editorial level, a number of distracting features appear. Aside from
the regular typographical errors, this volume features introverted
citation - referencing itself as a manuscript on a number of occasions
when cross-referencing to appropriate sections would have been more
appropriate. Much of the primary locus of the book, found in Chapter 3,
has a distinctly recycled feel with scarce concern for editorial
contributions. Certain points, such as "CPP-TRS is a methodology and a
language" are unnecessarily repeated throughout. Such oversights are
unfortunate since the overall contribution of the book is unique in its
field and will doubtless be of value to researchers in the areas of text
linguistics, semiotics, document processing, dialogue modelling,
pragmatics, natural language generation and cognitive science.



 
ABOUT THE REVIEWER:
ABOUT THE REVIEWER


Baden Hughes is a Research Fellow in the Department of Computer Science
and Software Engineering at the University of Melbourne and a Research
Engineer in the Victoria Laboratory of NICTA, Australia's Centre of
Excellence in Information Technology. His research interests are in the
areas of formal and computational models of human language; statistical
natural language processing; digital libraries; web data mining
documentary linguistics; computer-assisted language learning and
information security.


Amazon Store: