IUBio

Connectionist Learning - Some New Ideas

Daniel CrespinUCV dcrespin at gauss.ciens.ucv.ve
Thu May 23 13:05:35 EST 1996


Dear Dr. Asim Roy:

Your message on the subject "Connectionist Learning - Some New
Ideas" is indeed interesting. The papers abstracted below seem
relevant to the Ideas/Questions you state. In fact, the problems
of A) Network design, B) Robustness in learning, C) Quickness in
learning, D) Efficiency in learning, and E) Generalization in
learning, are solved with the algorithms explained in [1] and
[2]. Detailed comments, divided into six parts, can be found
below, after the abstracts.

To obtain  the preprints use a Web browser and the following URL:

http://euler.ciens.ucv.ve/Professors/dcrespin/Pub/

                             **ABSTRACTS**

[1].Neural polyhedra: Explicit formulas to realize any
polyhedron as a three layer perceptron neural network. Useful to
calculate directly and without training the architecture and
weights of a network that executes a given pattern recognition
task. See preprint below. 8 pages.

[2].Pattern recognition with untrained perceptrons: Shows how to
construct polyhedra directly from given pattern recognition
data. The perceptron network associated to these polyhedra (see
preprint above) solves the recognition problem. 10 pages.

[3].Neural network formalism: Neural networks are defined using
only elementary concepts from set theory, without the usual
connectionistic graphs. The typical neural diagrams are derived
from these definitions.This approach provides mathematical
techniques and insight to develop theory and applications of
neural networks. 8 pages 

[4].Geometry of perceptrons: It is proved that perceptron
networks are products of characteristic maps of polyhedra. This
gives insight into the geometric structure of these networks.
The result also holds for more general (algebraic, etc.)
perceptron networks, and suggests a new technique to solve
pattern recognition problems. See other preprints in this
location. 3 pages. 

[5].Generalized Backpropagation: Global backpropagation formulas
for differentiable neural networks are considered from the
viewpoint of minimization of the quadratic error using the
gradient method. The gradient of (the quadratic error function
of) a processing unit is expressed in terms of the output error
and the transposed derivative of the unit with respect to the
weight. The gradient of the layer is the product of the
gradients of the processing units. The gradient of the network
equals the product of the gradients of the layers.
Backpropagation provides the desired outputs or targets for the
layers. Standard formulas for semilinear networks are deduced as
a special case.

**COMMENTS TO THE MESSAGE OF DR. ASIM ROY**

Dr. Roy says:

We have recently published a set of principles for learning in
neural networks/connectionist models that is different from
classical connectionist learning (Neural Networks, Vol. 8, No.
2; IEEE Transactions on Neural Networks, to appear; see
references below). Below is a brief summary of the new learning
theory and why we think classical connectionist learning, which
is characterized by pre-defined nets, local learning laws and
memoryless learning (no storing of training examples for
learning), is not brain-like at all. Since vigorous and open
debate is very healthy for a scientific field, we invite
comments for and against our ideas from all sides.   "A New
Theory for Learning in Connectionist Models"  We believe that a
good rigorous theory for artificial neural
networks/connectionist models should include learning methods
that perform the following tasks or adhere to the following
criteria:

A.          Perform Network Design Task: A neural
network/connectionist learning method must be able to design an
appropriate network for a given problem, since, in general, it
is a task performed by the brain. A pre-designed net should not
be provided to the method as part of its external input, since
it never is an external input to the brain. From a
neuroengineering and neuroscience point of view, this is an
essential property for any "stand-alone" learning system - a
system that is expected to learn "on its own" without any
external design assistance.

### Comment (1 of 6) to A: One of the basic purposes of neural
networks is to perform pattern recognition, and this can be
reduced in most (perhaps all) cases to the following: Given a
finite set A of examples and a finite set B of counterexamples,
with A and B non-empty disjoint subsets of n-dimensional space
R^n, define in an explicit way the characteristic function f of
a region R of R^n such that A is contained in the interior of R
and B is disjoint from the closure of R. One says that f
discerns or recognizes A and B. The differentiation of A and B
is accomplished by f because f(a)=1 for all a in A and
f(b)=0 for all b in B. A special case occurs when all elements
of A and B are binary vectors. Recall from [2] that if new
examples are added, say points in a set A', they are expected to
lie in R. Similarly, additional counterexamples B' are expected
to lie in the exterior of R. If this is the case the region R is
'good'. Otherwise one has of 'overfitting', 'underfitting' or
both, that is, 'unfitting'.

The standpoint taken here is that pattern recognition consists in
differentiating or discerning two classes of input objects, the
elements of A and B. More complex tasks can in many cases be
reduced to this.

Note that the function f has the following reductionistic
effect: The output is 1 for all points of A and 0 for all points
of B. Therefore from the viewpoint of the output all the
elements of A are considered identical to each other and
similarly for elements of B.

On the other hand, A contained in the interior of R means that
some additional room is left in R,hopefully for the extra points of A'.
And since B is in the exterior of R, room is in principle
available, hopefully for points of B'.

It should be clear that both in theory and in practical situations,
under- and over-fitting is extremely difficult to rule out in
advance. If a neural network is interpreted as some form of
'knowledge' then this means that NN knowledge is imperfect,
being always subject to refinement and/or to refutation.
Metaphorical continuation of this leads to: Perfect knowledge is
trascendental. It is beyond neural networks and probably beyond
human brains (minds?) to function in infallible ways, even in the case of single issues.

The usual paradoxes then appear. The statement "Knowledge can
always be refuted" is knowledge in a human brain. If brains are
considered as some sort of perceptron neural network then the
knowledge they carry is refutable. But in the present case the
refutation implies that there exists knowledge that cannot be
refuted. Thus, the classical epistemological problems reappear
in the context of neural networks.

It is proved in preprint [4] above that if the threshold
functions are discontinuous (Heaviside functions) then
perceptron neural networks with n real valued inputs
(equivalently, with  a single input equal to a vector in R^n)
are products of characteristic maps of polyhedra contained in
R^n. Therefore, recognition of patterns with the aid of
perceptron networks is included, at least in principle, in the
general case of specifying a region R, which for perceptrons is
a polyhedron.

The region R that discerns A and B has two important properties:
1) Depends on A and B. 2) Is non-unique.  In particular, for
perceptrons the polyhedron R depends on the data. It is shown in
[2] how to construct, given A and B, a suitable polyhedron R.
This polyhedron can then be realized as a percepron neural
network with at most three layers, as shown in [1]. Actual
algorithms to construct the polyhedron and the network are given
in [1] and [2] and these algorithms in fact carry out the
'design' of a classical perceptron neural network specifically
suited to the data of the recognition problem.

The non-uniqueness of the polyhedron means that there are in
principle many different perceptron architectures and choice of
weights that recognize A and B. One consequence is that several networks
that recognize A and B can perform in disparate ways with the
additional data A' and B'. This can be called "plurality of
agreement": The networks agree on the 'basic issue' of
recognizing A and B but differ on the more difficult extension
of the original problem, namely, discerning the larger data sets
obtained when A' is added to A and B' is added to B. If
modelling brains with NN's is a valid procedure then plurality
of agreement could perhaps be related to the variable results of
education and in to the diversity of behaviour within an
otherwise uniform group of individuals.

*** END OF COMMENT 1 OF 6 ***

Dr. Roy continues:

B. 	Robustness in Learning: The method must be robust so as not to have the local minima problem, the problems of oscillation and catastrophic forgetting, the problem of recall or lost memories and similar learning difficulties. Some people might argue that ordinary brains, and particularly  those with learning
disabilities, do exhibit such problems and that these learning requirements are
the attributes only of a "super" brain. The goal of neuroengineers and neuroscientists is to design and build learning systems that are robust, reliable and powerful. They have no interest in creating weak and problematic learning devices that need constant attention and intervention.

### Comment (2 of 6) to B: The methods of [1] and [2] are robust
in the sense indicated: they have nothing to do with minimizing
error functions, backpropagation, local minima, flat regions,
etc. If the data sets A and B are given then the algorithms
produce the architecture and weights of a neural network that
defines a polyhedron R with the required properties. No
'learning difficulties' appear. However, 'performance
difficulties' due to unpredictable fitting can always show up.

About the performance expected from artificial neural networks,
statement B represents a reasonable goal. Unless, of course, a science
fiction scenario appears, with computers developing
their own interests and dominating or wiping out humans, for the
moment an unlikely situation. Neural networks do not even seem
currently powerful enough to merit special attention within the
general political issues surrounding Computocracy.

The observation and study of bird wings and bird flight was an
inportant historical step in the development of flying machines.
Natural bird wings have feathers, airplane wings do not.
Artificial flight outperforms flying animals at least in some
aspects like speed and carrying capacity. Feathers in airplane
wings would probably be a nuisance. Similar comments can be made
about land transportation with regard to legs of horses and
weels of cars. The point is that imitating Mother Nature is no
doubt useful but does not have to be carried to extremes. Note that once airplanes become available the purpose arose of using them to drop explosives, chemicals and biological warfare agents. The political issues are hard to avoid.

**********

Because of possible misinterpretations and ethical concerns the
following phrase of Dr. Roy requires a separate comment.

"Some people might argue that ordinary brains, and particularly 
those with learning disabilities, do exhibit such problems and
that these learning requirements are the attributes only of a
"super" brain. The goal of neuroengineers and neuroscientists is
to design and build learning systems that are robust, reliable
and powerful. They have no interest in creating weak and
problematic learning devices that need constant attention and
intervention"

After a reference to 'ordinary brains' and to 'those with
learning disabilities' the term 'super brain' appears and the
goal is set for neuroengineers and neuroscientists to design
things 'robust, reliable and powerful' and not 'weak and
problematic'.  This awakens too many reminiscences of issues
like eugenics, superior races, nazism, ethnic cleansing and
the like, not to mention possible offence to persons concerned
about people with disabilities. I assume Dr. Roy was unaware of
these implications but nevertheless I request from him a
clarification of this particular point.

Let me add that much has been learned about language, vision and
brain mechanisms in general, by studying disabled or handicapped
persons. Neuroscientists have to be grateful to this group of
fellow human beings and should reciprocate.

Dinosaurs were in many ways much more powerful than their weak
contemporaries, the problematic and unreliable mammals. However,
evolution has not been kind to dinosaurs. Nobody knows about
forthcoming surprises.

Engineering is often defined as the use of scientific knowledge
to satisfy social and individual human needs. A remarkably
successful evolutionary strategy adopted by many species is
cooperative behaviour. For us human beings, this implies not
only concern about the well being of other humans and of society
in general but also solidarity with the less gifted and the
weak. Instead of just abandoning or destrying persons with
neurological damage (or with any illness) considerable medical
effort is spent on them. Engineers are involved in creation of
most diverse prosthesis. The powerful brains of neuroscientists
and neuroengineers are not alien to the needs of society. They
should and will, with extreme dedication and in a most
constructive way, concern themselves with brains considered less
powerful.

For all these reasons I think that care is needed when refering
to less fortunate persons, particularly in a context that, even
if not intended, could be perceived as disrespect or disregard
for them, or as eulogistic to ideologies that have already
produced considerable human suffering.

These critical remarks and comments on a non-technical question
of wording are made respectfully, with a sense of duty and expecting to settle
the matter without raising major issues.

********************

*** END OF COMMENT 2 OF 6 ***

Dr. Roy continues:

C. 	Quickness in Learning: The method must be quick in its
learning and learn rapidly from only a few examples, much as
humans do. For example, one which learns from only 10 examples
learns faster than one which requires a 100 or a 1000 examples.
We have shown that on-line learning (see references below), 
when not allowed to store training examples in memory, can be
extremely slow in learning - that is, would require many more
examples to learn a given task compared to methods that use
memory to remember training examples. It is not desirable that
a neural network/connectionist learning system be similar in
characteristics to learners characterized by such sayings as
"Told him a million times and he still doesn't understand."
On-line learning systems must learn rapidly from only a few
examples.

# Comment (3 of 5) to C: The methods of [1] and [2] are
certainly very fast. And they can in principle learn from a pair
of sets with single elements: A={a} and B={b}. Also, since in
general the polyhedron R depends on A and B, the network retains
a certain 'memory' of both A and B. Furthermore, the network
buit around a set of 10 examples and 10 counterexamples (a 10-10
NN) will be, generally speaking, different from a 1000-1000 NN.
What is learned (i.e. the resulting NN) depends on the data set and because of under- and
over- fitting comparision of the resulting networks is not
obvious. There does not seem to be an obvious criterion to tell
when the extra examples are waste.

*** END OF COMMENT 3 OF 6 ***

Dr. Roy continues:

D. 	Efficiency in Learning: The method must be computationally
efficient in its learning when provided with a finite number of
training examples (Minsky and Papert[1988]). It must be able to
both design and train an appropriate net in polynomial time.
That is, given P examples, the learning time (i.e. both design
and training time) should be a polynomial function of P. This,
again, is a critical computational property from a
neuroengineering and neuroscience point of view.  This property
has its origins in the belief that  biological systems (insects,
birds for example) could not be solving NP-hard problems,
especially when efficient, polynomial time learning methods can
conceivably be designed and developed.

# Comment (4 of 6) to D: The algorithms of [1] and [2] are
extremely fast. If |A|=r (A has r elements) and |B|=t then the
number of operations is polynomial in rs. From the viewpoint of
the papers abstracted above the whole subject of complexity of
the NN's and the algorithms is rather extensive and requires a
separate paper.

*** END OF COMMENT 4 OF 6 ***

Dr. Roy continues:

E. 	Generalization in Learning: The method must be able to
generalize reasonably well so that only a small amount of
network resources is used. That is, it must try to design the
smallest possible net, although it might not be able to do so
every time. This must be an explicit part of the algorithm. This
property is based on the notion that the brain could not be
wasteful of its limited resources, so it must be trying to
design the smallest possible net for every task.   

#Comment (5 of 6) to E: The algorithms can design a rather small
network. However, to find for given data sets A and B, the
absolutely smallest possible net within the class of networks
the algorithm designs looks like a hard problem. An exhaustive
search seems necessary and this involves permuting the order in
which rs linear forms are processed, requiring exponential rs
time. But for a particular given order the algorithm efficiently
gives the best solution. Other, not yet explored, approaches to
the best network can be attempted but this is the subject of
further substantial research.

*** END OF COMMENT 5 OF 6 ***

Dr. Roy continues:

General Comments  This theory defines algorithmic
characteristics that are obviously much more brain-like than
those of classical connectionist theory, which is characterized
by pre-defined nets, local learning laws and memoryless learning
(no storing of actual training examples for learning). Judging
by the above characteristics, classical connectionist learning
is not very powerful or robust. First of all, it does not even
address the issue of network design, a task that should be
central to any neural network/connectionist learning theory. It
is also plagued by efficiency (lack of polynomial time
complexity, need for excessive number of teaching examples) and
robustness problems (local minima, oscillation, catastrophic
forgetting, lost memories), problems that are partly acquired
from its attempt to learn without using memory. Classical
connectionist learning, therefore, is not very brain-like at
all.  As far as I know, there is no biological evidence for any
of the premises of classical connectionist learning. Without
having to reach into biology, simple common sense arguments can
show that the ideas of local learning, memoryless learning and
predefined nets are impractical even for the brain! For example,
the idea of local learning requires a predefined network.
Classical connectionist learning forgot to ask a very
fundamental question- who designs the net for the brain? The
answer is very simple: Who else, but the brain itself! So, who
should construct the net for a neural net algorithm? The answer
again is very simple: Who else, but the algorithm itself! (By
the way, this is not a criticism of constructive algorithms that
do design nets.) Under classical connectionist learning, a net
has to be constructed (by someone, somehow - but not by the
algorithm!) prior to having seen a single training example! I
cannot imagine any system, biological or otherwise, being able
to construct a net with zero information about the problem to be
solved and with no knowledge of the complexity of the problem.
(Again, this is not a criticism of constructive algorithms.)  A
good test for a so-called "brain-like" algorithm is to imagine
it actually being part of a human brain. Then examine the
learning phenomenon of the algorithm and compare it with that of
the human's. For example, pose the following question: If an
algorithm like back propagation is "planted" in the brain, how
will it behave? Will it be similar to human behavior in every
way? Look at the following simple "model/algorithm" phenomenon
when the backpropagation algorithm is "fitted" to a human brain.
You give it a few learning examples for a simple problem and
after a while this "back prop fitted" brain says: "I am stuck in
a local minimum. I need to relearn this problem. Start over
again." And you ask: "Which examples should I go over again?"
And this "back prop fitted" brain replies: "You need to go over
all of them. I don't remember anything you told me." So you go
over the teaching examples again. And let's say it gets stuck in
a local minimum again and, as usual, does not remember any of
the past examples. So you provide the teaching examples again
and this process is repeated a few times until it learns
properly. The obvious questions are as follows: Is "not
remembering" any of the learning examples a brainlike
phenomenon? Are the interactions with this so-called "brainlike"
algorithm similar to what one would actually encounter with a
human in a similar situation? If the interactions are not
similar, then the algorithm is not brain-like. A so-called
brain-like algorithm's interactions with the external
world/teacher cannot be different from that of the human.  In
the context of this example, it should be noted that
storing/remembering relevant facts and examples is very much a
natural part of the human learning process. Without the ability
to store and recall facts/information and discuss, compare and
argue about them, our ability to learn would be in serious
jeopardy. Information storage facilitates mental comparison of
facts and information and is an integral part of rapid and
efficient learning. It is not biologically justified when
"brain-like" algorithms disallow usage of memory to store
relevant information.  Another typical phenomenon of classical
connectionist learning is the "external tweaking" of algorithms.
How many times do we "externally tweak" the brain (e.g. adjust
the net, try a different parameter setting) for it to learn?
Interactions with a brain-like algorithm has to be brain-like
indeed in all respect.  The learning scheme postulated above
does not specify how learning is to take place - that is,
whether memory is to be used  or not to store training examples
for learning, or whether learning is to be through local
learning at each node in the net or through some global
mechanism. It merely defines broad computational characteristics
and tasks (i.e. fundamental learning principles) that are
brain-like and that all neural network/connectionist algorithms
should follow. But there is complete freedom otherwise in
designing the algorithms themselves. We have shown that robust,
reliable learning algorithms can indeed be developed that
satisfy these learning principles (see references below). Many
constructive algorithms satisfy many of the learning principles
defined above. They can, perhaps, be modified to satisfy all of
the learning principles.  The learning theory above defines
computational and learning characteristics that have always been
desired by the neural network/connectionist field. It is
difficult to argue that these characteristics are not
"desirable," especially for self-learning, selfcontained
systems.  For neuroscientists and neuroengineers, it should open
the door to development of brain-like systems they have always
wanted - those that can learn on their own without any external
intervention or assistance, much like the brain. It essentially
tries to redefine the nature of algorithms considered to be
brainlike. And it defines the foundations for developing truly
selflearning systems - ones that wouldn't require constant
intervention and tweaking by external agents (human experts) for
it to learn.  It is perhaps time to reexamine the foundations of
the neural network/connectionist field. This mailing
list/newsletter provides an excellent opportunity for
participation by all concerned throughout the world. I am
looking forward to a lively debate on these matters. That is how
a scientific field makes real progress.

(References removed)

# Comment (6 of 6) to General Comments: It is not clear that "the brain itself designs the network for the brain". Learning behaviour results from the complex interaction of evolutionary, nutritional, genetical, familiar, social, economical and other factors, to say the least.

An additional desirable property for any NN program,
already implied by the comments but not explicitly mentioned by
Dr. Roy, is the question of proper fitting. This seems insoluble
or not well posed. See[2] above. But with regard to this point
it is possible to establish some comparisions between the classical
perceptrons and the NN's built with radial basis functions.
Percepton processing units are characteristic functions of
(non-homogeneous) linear half-spaces. They split R^n into two
symmetric halves, each with infinite volume; the complement
of a half space is another half space. Since a given half space
is isometric to its complement, as much is put inside as left
outside. On the other hand, radial basis functions are characteristic maps of
hyperballs. The inside of a hyperball has finite volume while
the outside has infinite volume (in R^n). These arguments are
rather heuristic, but they seem to indicate that underfitting
problems, overfitting problems, or both, will appear more often
in radial basis NN than in comparable classical perceptron NN.
Given the nature of the issue, performance tests are necessary
to decide the relative merits of classical perceptrons vs.
radial basis.

Motivation for the new NN program proposed in Dr. Roy seems to
be, at least partially, the biology of the brain. But opposite
arguments can also be based in Biology. Just to show a few more
facets of the problem the following can be said. According to
current scientific belief biological neural systems have been
subject to hundreds of millions of years (this could be
polynomial time, but not in a convincing way) of natural
selection, a most basic evolutionary mechanism. But if current
evolutionary models are correct this mechanism is some sort of
local optimization rule. The assumption that Mother Nature has
some kind of global mechanism or knowledge to foresee the long
term result of its selective activity is not considered orthodox
and could even be heretic, for certain purists at least. If it
is true that the brain uses a global strategy then a paradox
appears: The locally acting mechanism of evolution has
produced a globally optimizing brain.

The general techniques to address the issues A), B), C), D) and
E) are already available in [1]-[4]. They are a natural consequence of
Classical Connectionism and Neural Network Theory.
What should perhaps be necessary is to translate the mathematical
formulas in which the techniques are expressed into a language more
familiar to cognitive scientists and computer scientists.

Finally, and as certain deep thinker said time ago, we are just
children playing with a few pebbles in the seashore. Having such
a limited knowledge, how can we be sure our accomplishments or
goals are global optima? In more technical parlance, knowing
only a finite and bounded part of the infinite domain, and only
finitely many values of the function, we cannot say that what we
look at is a global optimum. With regard to geological times, to
nature, to evolution or to the cosmos, our human optima are
always local and relative.

I hope these remarks could be of interest to all concerned
persons and particularly to the originator of this discussion,
Dr. Asim Roy.

Sincerely

Daniel Crespin

PS: Will be away. Back beginning June.



More information about the Neur-sci mailing list

Send comments to us at biosci-help [At] net.bio.net