Alt.wais has been established to discuss the WAIS software.
Here is the official overview.
*********************** CLIP CLIP *************************
"WAIS Corporate Paper version 3"
MS-Word version available for anonymouse ftp from think.com in
/pub/wais/wais-overview-docs.sit.hqx. This file is
wais-corporate-paper.text
An Information System for Corporate Users: Wide Area Information Servers
Brewster Kahle Thinking Machines Corporation
Art Medlar Scolex Information Systems 8 April 1991
To explore text-based information systems for corporate executives, four
companies have jointly developed a prototype which gives flexible access to
full-text documents. The four participating companies are Dow Jones & Co.,
with its premier business information sources; Thinking Machines
Corporation, with its high-end information retrieval engines; Apple
Computer, with its user interface expertise; and KPMG Peat Marwick, with
its information-hungry user base.
One of the primary objectives of the project is to allow a user to retrieve
personal, corporate, and wide area information through one easy-to-use
interface. For example, instead of using Lotus Magellean(tm) for personal
information, Verity Topic(tm) for corporate data, and Mead Data Dialog(tm)
for published text, one application can access all three categories of
information. The user isn't required to become familiar with several
entirely different systems. In addition, since the interface consolidates
data from many different sources, they can be manipulated effortlessly,
virtually without regard to their origins.
The Wide Area Information Server (WAIS, pronounced "ways") project is an
experimental venture seeking to determine whether current technologies can
be used to make profitable end-user full-text information systems. Fifteen
users have been actively using the system for over three months. They have
integrated it into their workday routine in much the same way as they have
previously integrated spreadsheets and word processors. This preliminary
success has convinced us that a WAIS-like system can be a valuable tool for
corporate information retrieval. This paper discusses the design and
implementation of the prototype system.
Introduction
Electronic publishing is the distribution of textual
information over electronic networks. It has been emerging as a viable
alternative to traditional print publishing as the necessary underlying
technologies develop. Among the more essential of these are:
* High Resolution Display Screens
* Reliable, High-Speed Data Communications
* Desktop Publishing Systems
* Inexpensive Data
* Storage Media
While these technologies have been developed for uses other than electronic
publishing, they are the necessary precursors for full-text retrieval
systems.
>From the user's point of view, there are several problems to be overcome.
First, there must be some way of finding and selecting databases from a
potentially unlimited pool. Second, although these databases my be
organized in different ways, the user should not need to become familiar
with the internal configuration of each one. Finally, there must be some
practical way of organizing responses on the users machine in order to
maintain control over what may become a vast accumulation of data. In
addition, developers are faced with a number of architectural issues. The
system must be scalable; that is, it must allow for the future growth of
both the complexity and number of clients and servers. It must be secure;
each server's data must be protected from corruption, and the privacy of
the users must be ensured. Lastly, since an unreliable source is useless
in a corporate environment, access must be thoroughly robust.
System Overview
The prototype WAIS system takes advantage of current state-of-the-art
technology, and presents solutions to all of the above problems. The
system is composed of three separate parts: Clients, Servers, and the
Protocol which connects them.
The Client is the user interface, the server does the indexing and
retrieval of documents, and the protocol is used to transmit the queries
and responses, The client and server are isolated from each other through
the protocol. Any client which is capable of translating a users request
into the standard protocol can be used in the system. Likewise, any server
capable of answering a request encoded in the protocol can be used. In
order to promote the development of both clients and servers, the protocol
specification is public, as is its initial implementation.
On the client side, questions are formulated as English language questions.
The client application then translates the query into the WAIS protocol,
and transmits it over a network to a server. The server receives the
transmission, translates the received packet into its own query language,
and searches for documents satisfying the query. The list of relevant
documents are then encoded in the protocol, and transmitted back to the
client. The client decodes the response, and displays the results. The
documents can then be retrieved from the server.
Digital Researcher
The traditional information research scenario is familiar to anyone who has
ever visited a reference desk at a public or corporate library. The client
approaches a librarian with a description of needed information. The
librarian might ask a few background questions, and then draws from
appropriate sources to provide an initial selection of articles, reports,
and references. The client then sorts through this selection to find the
most pertinent documents. With feedback from these trials, the researcher
can refine the materials and even continue to supply the user with a flow
of information as it becomes available. Monitoring which articles were
useful can help keep the researcher on-track.
The WAIS system is an attempt at automating this interaction: the user
states a question in English, and a set of document descriptions come back
from selected sources. The user can examine any of the items, be they text,
picture, video, sound, or whatever. If the initial response is incomplete
or somehow insufficient, the user can refine the question by stating it
differently.
In addition, the user may also mark some of the retrieved documents as
being "relevant" to the question at hand, and then re-run the search. The
server recognizes the marked documents, and attempts to find others which
are similar to them. In the present WAIS system, "similar" documents are
simply ones which share a large number of common words; however, there is
potentially no upper limit on the intelligence of a server in determining
what similarity entails. This method of information retrieval is called
"relevance feedback." The idea has been around for many years (1) and the
first commercial system utilizing it, DowQuest (2), was voted Database of
the Year by Online Magazine in January 1989.
User Interfaces: Asking Questions
Users interact with the WAIS system through the Question interface. The
interface may appear different on various implementations: for example, a
character display terminal will have a different look than one which is
capable of displaying bit-mapped graphics. The key, however, is that the
user need only become familiar with one interface which provides access to
all available information sources.
The WAIS system, in this first incarnation, was designed to be used by
accountants and corporate executives who are relatively untrained in search
techniques. Consequently, to aid those users who have neither the time nor
desire to learn a special purpose query language, the system uses English
language queries augmented with relevance feedback. While the system's
servers currently do not extract semantic information from the English
queries, they do their best to find and rank articles containing the
requested words and phrases. Used in conjunction with relevance feedback,
this method of searching has proven to be more than adequate for the types
of searches and databases typically encountered.
The illustrations here are taken from the initial WAIStation program
produced at Thinking Machines for the Apple Macintosh. Several other
interfaces are under development at Apple Computer, Dow Jones, and
elsewhere. [omitted in text-only version]
* Step 1: Sources are dragged with the mouse into the Question Window. A
question can contain multiple sources. When the question is run, it asks
for information from each included source.
* Step 2: When a query is run, headlines of documents satisfying the query
are displayed.
* Step 3: With the mouse, the user clicks on any result document to
retrieve it.
* Step 4: To refine the search, any one or more of the result
documents can moved to the "Which are similar to:" box. When the
search is run again, the results will be updated to include documents which
are "similar" to the ones selected. Contacting Remote Sources of
Information [figure omitted] Figure 1: The Source description contains all
the necessary information for contacting an information server.
>From the user's point of view, a server is a source of information. It
can be located anywhere that one's workstation has access to: on the
local machine, on a network, or on the other side of a modem. The
user's workstation keeps track of a variety of information about each
se