  1. Joe Giampapa, fyi: Mead Data Central full-text databases

Message 1: fyi: Mead Data Central full-text databases

Date: Fri, 2 Aug 91 11:47:02 +0200 (MET)
From: Joe Giampapa <garof%sixcom.sixcom.itRICEVM1.RICE.EDU>
Subject: fyi: Mead Data Central full-text databases
Taken from Usenet,, the description of Mead Data Central
seemed of general interest to this group. My re-posting of this message is not
intended as an endorsement of any kind.
Mead Data Central (MDC) is currently looking for
Mead Corp is a $4.5 billion corporation, of which MDC is a subsidiary
(about $400 million). MDC has the largest collection of full-text
databases in the world. The databases contain legal material (from the
US Constitution, all 50 State Statutes, all courts, all of English and
French laws and cases), a large number of news reports and wires (for
example, the entire NY Times published since 1960 is online), financial
info, medical info, and a variety of other information. The databases
are full-text searchable, and the searches retrieve documents that the
user views.
To give you an idea of the volume of data involved: the amount of disk
space used is approx. 3 terra-bytes, comprised of some 3500-4000
databases. In a week, an average of 1.3 GB of data pours in. Every
day, an average of 500 databases are updated and backed up. The average
size of a database is 200 MB.
There are about 14 giant mainframe computers connected over a very
high-speed LAN that perform the searching for the users. Front end
processors maintain sessions similar to "login"s to reduce the load on
the search-engines. Just the logistics of connecting enough disks for
3000 gigabytes is a major achievement. MDC has its own wide-area
network to make available their online services to users are all across
the United States. Users may use dumb terminals or PCs essentially
emulating dumb terminals to connect to MDC and use the online services
for a fee.
The MDC system is up 23 hrs and 55 mins a day. The reliability and
availability is close to 99.1 per cent, but the management would like to
see that figure closer to 99.7 per cent. Even when an external source
like AT&T drops the leased lines that make MDC's WAN, it is counted as a
failure here. Fault tolerance and replication for availability are
bywords in anything that MDC does. They follow the usual motto that if
N of something are sufficient for the projected capacity, they run with
N+1 or N+2 to be able to continue at the same response level.
The system that exists today in MDC was built with a character based
terminal in mind, and is optimized for such. The major changes in
technology that MDC wants to take advantage of is the advent of cheap
workstations combined with very high-speed WANs, and yet cater to the
low-end user who has a 2400 baud modem and a dumb terminal to dial in
with. Real-time updates from digital feeds that carry information like
news-wires, stock quotes, etc. are also being considered.
