Date: Fri, 6 May 2005 09:15:56 +1000 (EST) From: Baden Hughes <badenh@cs.mu.OZ.AU> Subject: Visualizing Document Processing
AUTHOR: Tonfoni, Graziella; Jain, Lakhmi TITLE: Visualizing Document Processing SUBTITLE: Innovations in Communication Patterns and Textual Forms SERIES: Text, Translation Computational Processing 6 PUBLISHER: Mouton de Gruyter YEAR: 2004
Baden Hughes, Department of Computer Science and Software Engineering, University of Melbourne
SUMMARY DESCRIPTION
This book adopts as its core the idea that text processing, either cognitive or computational, is in fact the linguistic realisation of more abstract information management and processing. By its own admission, this volume is intended for a specific audience, namely information specialists whose interests are in the area of document reception and production, where accuracy and reliability are a crucial factor in information analysis. This book is likely to be of interest to researchers in the areas of text linguistics, semiotics, document processing, dialogue modelling, pragmatics, natural language generation and cognitive science.
In the first chapter, the theoretical background of a new approach towards language and information is presented - new terminology introduced and concepts defined. An polysynthetic, interdisciplinary approach is used in defining both scientific and linguistic paradigms through which interpretation can be carried out. The main paradigms incorporating intelligence in machines (knowledge-based systems, artificial neural networks, evolutionary computing, fuzzy logic and artificial agents) are presented.
In the second chapter, the implications resulting from rethinking of language and text through illustration both of the theory of text comprehension and text compression form the body of the work.
The third chapter illustrates in greater detail this new perspective on communication by familiarizing the reader with a complex visual system for interpretation of qualitatively different components in natural language, particularly textual documents. In the analysis of the physical manifestation, a stratified observation framework is adopted, allowing focus on different aspects of interpretation at both the macroscopic and microscopic levels. New (cognitive) tools for observing, describing and explaining qualitatively different phenomena in natural language are discussed.
The fourth chapter illustrates further evolution of the model, in particular its final output: CTML, a systematic but informal markup language used for strategic document annotation. This markup language and its corresponding document model, represent the climax of the research.
Concluding, the fifth chapter contains theoretical reflections about the requirements for metaphor creation in modern information science, together with practical suggestions for verifying and augmenting the consistency and relevance of analogical reasoning.
The main line of argument throughout the volume is as follows. The focus is on the representation of text procedures in terms of definitions and their visualizations (Chapter 3). The understanding of these constructions is prepared by the introduction of the conceptual system and the discussion of the various scientific paradigms which have an impact on them (Chapter 1) and the development of a particular theory of language and text (Chapter 2). The representation of the system itself is followed by a discussion of the application of the visual system as a kind of markup language (annotation language for documents) and its role in the work of information analysts (Chapter 4). The theoretical framework which starts with Chapter 1 and 2 is completed by a final chapter in which the crucial role of metaphors and analogies for scientific exploration receives additional emphasis (Chapter 5).
CRITICAL EVALUATION
Much of the motivation for this book appears to have been drawn from previous research by Tonfoni, who developed the CPP-TRS theory of text comprehension in the early 1990s.
For researchers who seek a computationally tractable representation with formal grounding, this text will be found wanting despite the apparently short distance between the abstractions discussed and such a mode. At a theoretical level, the "machines" which form the framework for Chapter 3 could equally well be expressed in alternative formalisms - graph theory, finite state machines being two which come to mind but which are not even mentioned in passing in the text. At a practical level, the obvious affinities with hypertext theory are never explored, yet are immediately apparent when discussing internal linkage and annotations within a document interpretation instance.
The components of the document annotation language (CTML) appear to be disengaged from naturally aligned theories which will be familiar to linguists (for example Rhetorical Structure Theory). While this in itself is not a major shortcoming, it does contribute to the overall feeling that this work is sufficiently removed from the core of the linguistics discipline that its contribution may not be as great as its potential.
At its core, the book proposes CTML, a document markup language. Disappointingly, CTML is presented as little more than a series of character based annotations and is not reduced to a computationally tractable representation, despite this apparently being quite trivial.
At an editorial level, a number of distracting features appear. Aside from the regular typographical errors, this volume features introverted citation - referencing itself as a manuscript on a number of occasions when cross-referencing to appropriate sections would have been more appropriate. Much of the primary locus of the book, found in Chapter 3, has a distinctly recycled feel with scarce concern for editorial contributions. Certain points, such as "CPP-TRS is a methodology and a language" are unnecessarily repeated throughout. Such oversights are unfortunate since the overall contribution of the book is unique in its field and will doubtless be of value to researchers in the areas of text linguistics, semiotics, document processing, dialogue modelling, pragmatics, natural language generation and cognitive science.
|