The following document is relevant to both program writers and
managers/user support staff of multi-software collections on nodes
which serve interactive users. Any comment is welcome.
The document discusses intended changes necessary to maximize the
efficiency of computer-aided molecular biology analysis performed on
the UK national SEQNET node at Daresbury but would apply equally well
at other sites. These changes include major alteration of both
user-program interfaces and user support documentation.
SEQNET at Daresbury UK runs a wide range of molecular biology software
on a VAX 3600 linked to a DECSERVER 5100. It supports over 1300
academic and industrial subscribers. SEQNET is funded by the Science
and Engineering Research Council (SERC), UK.
Duncan Rouch
Frank Wright
Alan Bleasby
<-----///////////////////// CLIP ///////////////////////////////----->
Computational molecular biology from the user's viewpoint: basis for a
better interface between the biologist and the computer.
Contents:
0 Background
1 Introduction
2 Aim, Approach and Strategy
2.1 Aim and Approach
2.2 Strategy
3 The Biologist's Problem
4 Proposed Solutions
4.1 Software Solutions
4.1.1 A Global Interface Style
4.1.2 Input/Output Standardisation
4.1.3 Streamlining Interfaces
4.1.4 Online Menu System
4.2 Educational Solutions
4.2.1 The Role of the User Guide
4.2.2 Hard Copy and Online Documentation
5 Acknowledgements
------------------------------------------------------------------------
0 Background
Computers have become indispensable tools in the analysis of
biological information. Sophisticated methods of analysis can now be
carried out by running the appropriate software. Sequence and
structure analysis methods are continually being developed. This has
led to a software explosion in the last few years that has caused a
concomitant degree of confusion among biologists who must,
necessarily, use computer systems. It was recognised that education
of biologists in the use of such systems was crucial therefore a
SERC collaborative computational project (CCP11) was formed in 1990.
CCP11 is specifically for computational molecular biology. So far it
has organised colloquia on topics such as multiple sequence alignment.
The problems of more general education, provision of more intuitive
biologist/computer interfaces and the production of documentation from
a biologists eye view are now being addressed.
This document is a discussion of the problems as we see them and of
the possible solutions. Comment is invited.
Duncan Rouch
Frank Wright
Alan Bleasby
1 Introduction
This is a discussion of the problems a biologist may face when using a
computer system. The biologist is the `user' and has to deal with how
information is presented by the computer (the so called `user
interface'). This is a discussion which will be used a basis to write
the next SEQNET guide for molecular biologists. It addresses both the
users problems and a strategy for their solution. This will involve
changes in the programs as well as in the user guide material (printed
and online).
In a second document, to follow, we discuss more specifically the
structure of the proposed user guide.
2 Aim, Approach and strategy
2.1 Aim and approach
Aim: to maximise the efficiency of sequence and structure
analysis by the biologist on a computer system, such
as SEQNET, that provides a wide selection of programs
and packages.
Approach: this can be achieved by providing assistance in the choice
of the appropriate method, the appropriate program (and its
associated parameters) and in the interpretation of output.
The program/parameter selection problem can be tackled
by appropriate software changes; the problems of choice
of an appropriate method and the interpretation of output
must be dealt with by education.
2.2 Strategy
Graphical displays customised for biologists present the most
attractive route for solving the selection problem. These, however,
may require X-Windows terminals which not all sites can afford; most
systems must therefore provide a text-only user interface either in
isolation or alongside a graphical interface.
The goal, for both software and education strategies, is to increase
efficiency by streamlining the user-interfaces to programs. This has
the advantage that the biologist need only remember a limited number
of hardware and software operations. Within a given system, changes
to the underlying operating system (e.g. a move from VMS to UNIX) or
hardware should ideally be hidden from the user, a so-called seamless
environment.
Unfortunately, owing to restrictions on the availability of program
source code and copyright, the rewriting of interfaces cannot quickly
be achieved. A more realistic secondary solution is to produce a
flexible on-line menu system to hide the polymorphic program
interfaces. This has to be combined with improved user support, which
will include the restructured user-documentation.
3 The Biologist's Problem
Biologists face the problem of how to analyse a sequence
with computational molecular biology facilities. They must
be aware of suitable methods of analysis, learn and remember
how to use appropriate hardware and software. They must also
know how to interpret output generated by the software, which
includes awareness of the limitations of the method and
software used.
The wide range of available sequence and structure analysis software
presents molecular biologists with the opportunity to use many
different methods of analysis. Use of these methods currently demands
both hardware and software knowledge from the user. Acquisition of
necessary hardware knowledge, such as networking, is a relatively
trivial process. However, the biologist is confronted by an ever
increasing array of new software, which more often than not have
idiosyncratic user-interfaces. The biologist is therefore required to
learn and remember a completely new set of software commands for each
unique interface style. This is unsatisfactory.
Furthermore, most people have a range of work priorities such that
computing can only take a small fraction of their time: how else will
they collect data to verify the computer-aided predictions? Even if
they can find the time to learn how to use each program the lack of
constant reinforcement means they are likely to have to relearn the
command knowledge each time they log on. This lack of time can also
result in an unfamiliarity with new methods or even a misunderstanding
of existing ones. Inevitably this leads to inappropriate selection of
programs/parameters, incorrect assessment of output, an inertial
barrier against using new software and a tendency to blame software
for the deficiencies of a method.
The more software commands biologists have to know the more likely
they are to make mistakes. This wastes time as well as potentially
causing errors in data or its interpretation. The enthusiasm
programmers have displayed in producing arbitrary user interfaces
could, in future, be better channeled into reducing this problem.
In contrast to the polymorphic array of single programs with different
user-interfaces, within a package such as the GCG system all programs
possess a common interface style and the same names are used for
commands that perform the same activity in different programs. So, to
use a new program within the package, the biologist need only learn
commands specific to that program.
4 Proposed Solutions
4.1 Software Solutions
4.1.1 A Global Interface Style
Barring a rewrite of most programs to give a common
interface style, a more practical short term solution
is to assign priority for improvement to a subset of
programs that allow all 'basic' computer-aided molecular
biology methods to be carried out.
The initial set of programs for treatment will include all those
necessary to allow fundam