FYI: MARC Fields, Communications Research

Author: Gerry McKiernan

FYI Body: _Reconstitution of Meaning: MARC Fields as Morphemes_

In considering the possibilities of making better use of the
intellectual content embedded in MARC fields in my review of the
potential application of Data Mining and Knowledge Discovery in
Databases (KDD), it has occurred to me that such an an investigation
might prove useful if MARC fields were viewed as _morphemes_ [no no
morphine [:->]. The morpheme is considered by (many) linguists as a
basic unit of meaning within a language.

In the cataloging process, meaning is embedded within a defined
structure using acceptable rules of grammar (e.g. AACR2) - syntax if
you will [:->]. In such a process, a message about an individual work
is conveyed, using this grammar and an associated lexicon. Here the
physical 'meaning' and intellectual 'meaning' of an item are
translated into a message that is intended to describe the item and
its content.

While this process of bibliographic control has enabled users to
identify 'meaningful' items relevant to an information need, mos
existing and (even) New Age OPACS I've identified and compiled in my
Onion Patch (sm) clearinghouse at URL


do not, I believe, make full use of the meaning explicit or
implicit within these records.

To identify items that are most relevant to users [BTW: 'Relevance'
is a 'meaning-full' concept [:->]], we need to contemplate the
creation of OPACs that provide users (or allow users) to
'reconstitute' the meaning within these records. We need to develop
systems that can present users with items (i.e., records of cataloged
items within the OPAC) that best meet their needs using an 'optimal
syntax' determined by the 'meaningful' associations uncovered by a
Data Mining or a KDD process, or provide users with the ability to
select a different syntax (e.g. subject and publisher associations),
to identify that 'good book' on the subject. Likewise, we need to
provide users with the ability to 'cross-tabulate' associations within
MARC fields such that they be provided with a ranked listing of items
by publisher-author-call number, or call number-publisher, or subjec
heading-publisher, or other potentially meaningfull association of
their choosing. [I have sketched out a mock-up interface for this
function and will certainly let the list(s) know, when it's available]

In addition to associations revealed in the application of Data
Mining and KDD to an appropriate catalog database (e.g., the OCLC
cataloging database) or selected local OPAC database of peer groups
(e.g. RLG), as well as the desired associations of users themselves,
comprehensive log data should also reveal useful associations tha
might provide a new syntax, or enhance one already considered. [Here
circulation data would be very important, as would OPAC transaction
log data, as Larson has demonstrated in his study of subject access in

One could envision the application of the methods of Computation
Linguistics applied to MARC records or even (perhaps)
Transformational/Generative Grammar [:->] [Long Live Noam Chomsky!] !

Once again, as always, any reactions to such musings would be mos
welcome. [In particular, I am interested any literature relating to
the application of linguistic theories/practices to bibliographic and
MARC record structure.]


Gerry McKiernan
Curator, CyberStacks(sm)
Iowa State University
Ames IA 50011


"Oh No!, Not Another Project"

P.S. One could certainly apply these envisioned methodologies to
any Metadata regime (e.g. The Dublin Core, TEI, etc.).

Many Humanists will likely be interested in the existence and work of
the Human Communication Research Centre, Edinburgh and Glasgow, which
"brings together theories and methods from several disciplines. Formal
linguistics and logic, computational modelling, and experimental
psychology are all recruited to the pursuit of a common goal. When
people communicate, they process vast quantities of information. To
understand better how this happens, we focus on spoken and written
language; we also study communication in other media - visual,
graphical and computer-based." See the URL

