From: Marina Santini <MarinaSantini.MSgmail.com>
Subject: Multimodality and Genre
E-mail this message to a friend
Discuss this message
Announced at http://linguistlist.org/issues/20/20-31.html
AUTHOR: Bateman, John
TITLE: Multimodality and Genre
SUBTITLE: A Foundation for the Systematic Analysis of Multimodal Documents
PUBLISHER: Palgrave Macmillan
Marina Santini, Honorary Research Fellow in the Humanities Advanced Technology
and Information Institute (HATII), University of Glasgow, UK.
The volume ''Multimodality and Genre'' by John Bateman is a monograph that
presents a framework for the page-based analysis of multimodal documents, such
as magazines, books, web pages, newspapers and similar. Multimodal documents
combine text, graphics and pictures in more or less complex layouts. Readers are
shown an approach that breaks down multimodal documents into configurations of
basic elements to uncover meaning. This approach was originally developed within
the Genre and Multimodality (GeM) project. A major claim of the book is that
there is a need of detailed empirical analysis in order to advance our
understanding of the complex multimodal meaning processes involved in multimodal
documents. The concept of genre is proposed as a crucial theoretical construct
for exploring document meaning making. The book is suitable for genre analysts,
discourse analysts, document designers, computational linguists and semioticians.
The book contains seven chapters, a table of contents, lists of tables and
figures and indices. Number of pages: 312.
Chapter 1 (''Introduction: Four Whys and a How'') contains the purpose and the
motivation of the volume. The aim of the book is to present a framework for the
empirical and reproducible page-based analysis of multimodal documents.
Multimodal documents are made of a variety of visually-based modes that create a
net of communicative goals. The proposed framework can be used to explore how
the interaction of multiple modes can be combined within individual artefacts.
In particular, the framework is designed to help analysts understand what is
gained in the combination of different modes and what kind of semantic
relationships they establish. The motivation of this framework is the assumption
that the combinations of elements signal meaningful relationships that would not
be revealed by those elements in isolation.
In Chapter 2 (''Multimodal Documents and their Components'') the author places
different approaches to document analysis into a common context. The author
argues throughout the chapter that providing sound means for determining the
''components'' of a page or a document is a crucial prerequisite for carrying out
empirical investigations of the processes of interpretations, for critique, and
for comparing different approaches. Document components must be identified and
described in a reproducible way. The main parts of Chapter 2 focus on 1) the
page as object of interpretation, 2) the page as object of perception, and 3)
the page as object of production. In the last summarizing section, the author
focuses on two complementary perspectives from which to view what happens within
a multimodal page. The first sees graphic design as 'macro-punctuation', similar
to text-based typography or formatting. The second perspective considers pages
as visual entities and document design as a process of visual decomposition. The
author concludes by saying that allowing both perspectives is the necessary
prerequisite for capturing a fuller range of possibilities.
Chapter 3 (''The GeM Model: Treating the Multimodal Page as Multilayered Semiotic
Artefact'') combines the perspectives described in the previous chapter in order
to articulate an account of document components that is sufficiently
well-defined to support reproducible analyses. The overall aim is ''to work
towards functionally supportable hypotheses by means of sufficiently
fine-grained formal details so as to allow empirical investigation, verification
and refutation'' (p. 107). For the analyses, the author suggests a number of
layers of description. Each layer of the description of a page artefact tells us
something different about how the page is constructed. The particular layers
that the author has isolated in his investigations as being crucial are five,
the GeM base, including base units such as: sentences, headings, titles,
headlines, icons, table cells, list items, list labels, etc. (Chapter 3, p. 111)
the layout base, including layout segmentation, realisation information, and
layout structure (Chapter 3, pp. 116-129); a concrete example of layout analysis
is provided with a page from a Dorling-Kindersley guide to Paris that describes
the parts of the Louvre (Chapter 3, pp. 130-143)
the rhetorical base (Chapter 4)
the genre base (Chapter 5)
the navigation base (Chapter 6)
Each layer defines its own basic set of units as well as relations and
structures defined over these units. The relations between layers are left open
to empirical investigations. The author does not impose any particular
inter-layer relationship beyond the simplest assumption that some configurations
of units in one layer might be expressed in terms of some configurations of
units within other layers. Then the chapter narrows down and presents the layout
Chapter 4 (''The Rhetorical Organization of Multimodal Documents'') defines
methods for exploring how configurations of elements take on significance over
and above their spatial proximity, visual similarity or difference. The author
states that one common approach to describing the functions communicated by a
combination of distinct elements on the page is to employ notions of rhetorical
organization. He adopts one specific account of rhetorical organizational called
Rhetorical Structure Theory (RST), which is frequently used in linguistic
approaches to explain textual coherence. The author argues that extending this
approach to encompass multimodal rhetorical organization provides the required
analytic hold to multimodal analysis. In the same way ''as segments of a texts
contribute to that text's coherence in systematic and specifiable ways, so can
segments of a multimodal document, involving pictures, diagrams and texts, be
related in an analogous manner also'' (p. 144).
The chapter contains a brief introduction to RST (pp. 144-151) and how this
theory is used within the GeM rhetorical layer (pp. 151-163). It also includes
example analyses of rhetorical relations between layout units (pp. 163-174). The
author summarizes the chapter by saying that when a document starts to utilize
the full two-dimensional spatial extent of the page for expressing rhetorical
and other functional organizations, we move into a different semiotic mode: one
which he terms 'page-flow'. Page-flow can combine elements in any of the
semiotic modes appearing on a page, including 'text-flow' (i.e. running text),
diagrams, graphs, etc. It adds to the individual contributions of these elements
the possibility of a rhetorical unity supporting the communicative intentions of
the document. Without this level of description, he says, ''we are not in a
position to explicate many of the spatial distribution decisions taken in
page-based documents'' (p. 176). However, at the present time, he admits, it is
an open question as to how much of the detail of rhetorical organisation is
expressible visually. The resources that are actually employed in any document
and the ways in which they are distributed around the semiotic modes activated
also depend to the type of document - or genre - employed. For this reason, the
author considers the concept of genre important and deals with it in the next
Chapter 5 addresses the issues of comparison and constraints. This means that in
the analysis of a multimodal document, one needs to consider the sets of
documents that it resembles and the sets of documents with which it stands in
contrast. The author argues that, in this respect, genre is a fundamental
concept for the analysis of meaning in multimodal documents. Usually readers
allocate documents to particular classes of documents, and those classes bring
with them certain interpretive frames and expectations. These frames guide
readers to make sense of what they read in the document. Moreover, the decisions
taken during document production rely on the conventions and practices
established for the class of documents to which the document is meant to belong.
Therefore, effective document use requires a process of negotiation between the
norms for the document type - i.e. the genre - and the functional requirements
of the specific document. Since the extension of traditional notions of genre to
multimodal documents is not straightforward, the author suggests that the
framework for multimodal genre should be drawn from linguistically-motivated
accounts of genre, because, from this perspective, ''genre offers a method for
relating any individual document encountered to its 'generic' context by means
of explicitly identifiable design decision'' (p. 182). The chapter continues by
presenting the state of the art of views on genre (pp. 183-217), then three
basic modes of genre representation are discussed: typological (genre typology,
pp. 219-223), topological (genre topology, pp. 223-225) and faceted (the facets,
pp. 218-219). The typological view of genre can be represented as classification
networks; topological accounts are characterized in terms of variation; the
faceted approach is midway between the typological and topological and builds on
facets, which are semi-independent classification systems. Then the author
explains his own representation of genre in Section 5.3. Here the author
stresses the importance of pursuing a notion of genre that admits of fluidity
and change while still imposing sufficient constraint to retain predictive value
when several semiotic modes are combined. His sources of inspiration are Waller
(1987), who stresses the importance of genre for typographical work expressed
through a set of choices, and Lemke (1999), who suggests building trajectories
of similarity across superficially different genres. The notion of genre
proposed by the author is monitored at work in relation to two sets of loosely
related documents. First, it is shown how the documents can be assigned to
similar and contrasting genres, and second, it is described how tracking these
kinds of documents over time starts to reveal generic trajectories of changes.
The analyses presented draw on all aspect of the GeM model introduced in the
Chapter 6 (''Building Multimodal Document Corpora'') contains a characterization
of corpus-based linguistics, the state of the art in linguistic corpora, and the
suggestion of using the GeM model as a corpus annotation scheme. The author
emphasizes how the adoption of the methods of corpus-based linguistics is a
crucial step because documents must be seen against the background provided by
relevant co-generic documents. The author sees each layer of the GeM model as a
stand-off layer of annotation decomposing the documents analysed. The layers
themselves are all defined in terms of XML descriptions. This allows analysts
both to store the information following the GeM model and to use that
information for constructing complex corpus queries that freely combine
information from the layers of the GeM model. The author sees the ability to
locate patterns that hold across distinct layers of the model as an essential
precondition for locating genre characteristics.
Chapter 7 (''Conclusions and Outlook: What Next?'') summarizes the proposed
framework for multimodal document analysis and puts forward three directions for
future work, namely 1) extension to dynamic documents; 2) a three-dimensional
layering of the layout; finally 3) temporal development in the image-flow mode.
The book offers a useful overview and summary of the achievements of the GeM
project (run between 1999 and 2002) and its further development. It poses many
challenging issues and can be considered a required reading (together with Biber
et al. 2007; Bruce 2008; Heyd 2008; Martin and Rose 2008) for those who
currently try to pin down the concept of genre for empirical or computational
However, the approach proposed in this book raises a few questions:
1) Although the author points out several times that ''this work is still very
much in its infancy'' (p. 247) and that corpora-based investigation must be
carried out in the future, the qualitative analyses on only a handful of
documents somewhat question the practical feasibility of the proposed approach:
why does the GeM corpus annotated in so many years of research include only 10
corpus-anno.html)? Is this approach applicable at all on a large scale or is it
2) Although I agree on many points of the characterization of genre given by the
author (for instance, the linguistic-motivated approach to genre, the
predictivity power of genre, the presence of trajectories of similarity across
superficially different genres, etc.), it is not clear how to create
reproducible genre classes for empirical corpus-based studies and analyses. In
certain fields, like Automatic Genre Identification, researchers are engaged in
finding representative genre labels and struggling to create collections of
documents that instantiate these genre labels (e.g. see Sharoff forthcoming).
The author only says that: ''staying within intuitive genre labels, such as, for
example, 'newspaper' or 'guide book', is far from optimal precisely because it
creates artificial boundaries that the dimensions of variation manipulated
within genres do not necessarily respect'' (p. 229). So, one spontaneous question
is: what are the ideal genre labels one should work with?
3) Although the author says that we should aim ''not only to have an account of
genres that exist, or have existed, but also to suggest properties for genres
that do not (yet) exists'' (p. 225), it seems difficult to apply the diachronic
approach described in Section 5.5 to emerging genres. When do we decide that a
genre is coming into existence so that we can identify its representative features?
A puzzling fact in the book is that the notion of genre for multimodal documents
is a little bit hard to extrapolate. It is placed in Section 5.3, but it is so
much interspersed with citations and digressions that it takes some time to be
detected and isolated. It would have helped to have a summarizing section or
subsection where all the characteristics of genre are listed and motivated
(similar to Swales 1990:45-58).
The book contains stimulating and provocative statements, for instance the
proposed detachment of genre from culture: ''we cannot simply compare texts on
the basis of the assumption that they constituted a single time-extended genre:
there are variables at work. Genres must be described independently of the
particular use that a culture makes of them. Genres do not merely 'reflect'
conventions: each instance of a particular genre helps create conventions and
hence generic expectations'' (p. 248).
In conclusion, the book is an important piece of the still unsolved genre riddle.
Biber D., Connor U. and Upton T. (eds.) (2007). Discourse on the Move. John
Benjamins Publishing Company.
Bruce I. (2008). Academic writing and genre: a systematic analysis. Continuum.
Heyd T. (2008). Email Hoaxes. Form, function, genre ecology. John Benjamins.
Lemke J. (1999). Typology, topology, topography: genre semantics. MS University
Martin J. and Rose D. (2008). Genre relations: mapping culture. Equinox.
Sharoff S. (Forthcoming). In the garden and in the jungle: comparing genres in
the BNC and Internet. In Mehler A., Sharoff S. and Santini M. (Forthcoming).
Genres on the web: Computational Models and Empirical Studies. Springer.
Swales J (1990). Genre Analysis. English in academic and research settings.
Waller R. (1987). The typographical contribution to language: towards a model of
typographic genres and their underlying structures. PhD thesis. University of
Reading, Reading, UK.
ABOUT THE AUTHOR
Marina Santini is a computational linguist interested in genre, sentiment, and other discourse categories (also known as non-topical descriptors). Her research interests span from web documents to web development and evolution, corpus design and construction, automatic feature extraction, and genre classification algorithms.
This Year the LINGUIST List hopes to raise $65,000. This money will go to help
keep the List running by supporting all of our Student Editors for the coming year.
See below for donation instructions, and don't forget to check out our Space Fund
Drive 2010 and join us for a great journey!
There are many ways to donate to LINGUIST!
You can donate right now using our secure credit card form at
Alternatively you can also pledge right now and pay later. To do so, go to:
For all information on donating and pledging, including information on how to
donate by check, money order, or wire transfer, please visit:
The LINGUIST List is under the umbrella of Eastern Michigan University and as
such can receive donations through the EMU Foundation, which is a registered
501(c) Non Profit organization. Our Federal Tax number is 38-6005986. These
donations can be offset against your federal and sometimes your state tax return
(U.S. tax payers only). For more information visit the IRS Web-Site, or contact
your financial advisor.
Many companies also offer a gift matching program, such that they will match
any gift you make to a non-profit organization. Normally this entails your
contacting your human resources department and sending us a form that the
EMU Foundation fills in and returns to your employer. This is generally a simple
administrative procedure that doubles the value of your gift to LINGUIST, without
costing you an extra penny. Please take a moment to check if your company
operates such a program.
Thank you very much for your support of LINGUIST!
Read more issues|LINGUIST home page|Top of issue