COMPUTATIONALLY EFFICIENT INVARIANT

Transcription

COMPUTATIONALLY EFFICIENT INVARIANT
COMPUTATIONALLY EFFICIENT INVARIANT PATTERN
RECOGNITION WITH HIGHER ORDER PI-SIGMA NETWORKS1
Yoan Shin and Joydeep Ghosh
Department of Electrical and Computer Engineering
The University of Texas at Austin
Austin, TX 78712
ABSTRACT
A class of higher-order networks called Pi-Sigma networks has recently been introduced
for function approximation and classication [4]. These networks combine the fast training
abilities of single-layered feedforward networks with the non-linear mapping of higher-order
networks, while using much fewer number of units. In this paper, we investigate the applicability of these networks for shift, scale and rotation invariant pattern recognition. Results
obtained using a database of English and Persian characters compare favorably with other
neural network based approaches [2, 3].
1. Introduction
Feedforward networks based on single layer of linear threshold logic units (TLUs) can
exhibit fast learning, but have limited capabilities. For instance, the ADALINE and the
simple perceptron can only realize linearly separable dichotomies [1]. The addition of a layer
of hidden units dramatically increases the power of layered feedforward networks. Indeed,
networks with a single hidden layer, such as the multilayer perceptron (MLP), and using
arbitrary squashing functions, are capable of approximating any Borel measurable function
from one nite dimensional space to another to any desired degree of accuracy, provided
suciently many hidden units are available. However, the training speeds for MLP are
typically much slower than those for feedforward networks comprising of a single layer of
TLUs due to multilayering that necessitates backpropagation of error.
In an orthogonal direction, higher-order correlations among input components can be
used to construct a higher-order network to perform nonlinear mappings using only a single
layer of units [2]. The basic building block of such networks is the higher-order processing
unit (HPU), a neural-like element whose output y is given by:
y
= (w0 +
Xw x + Xw
j
j
j
jk
j;k
xj xk +
Xw
jkl
xj xk xl +
);
(1)
j;k;l
where () is a suitable nonlinear activation function such as the hyperbolic tangent, x is the
j -th component of input vector x, w
is an adjustable weight from product of input components x ; x ; x ; to the output unit, and w0 is the threshold. Higher-order correlations
enable HPUs to learn geometrically invariant properties more easily [2]. Unfortunately, the
j
jkl
j
k
l
This research was supported by DARPA/ONR contract N00014-89-C-0298 with Dr.Barbara Yoon
(DARPA) and Dr.Thomas McKenna (ONR) as government cognizants.
1
1
number of weights required to accommodate all higher-order correlations increases exponentially with the input dimension, N . Consequently, typically only second order networks are
considered in practice. A notable exception is when some a priori information is available
about the function to be realized. Such information has also been used with some success
to remove \irrelevant" terms [2] Such a restriction to the order of the network leads to a
reduction in the mapping capability, thereby limiting the use of this kind of higher-order
networks. The Pi-sigma network introduced in the next section attempts to combine the
best of single-layered networks (quick learning) and multi-layered networks (more capabilities with small weight set).
2. Pi-Sigma Networks
Figure 1 shows a Pi-sigma Network (PSN) with a single output. This network is a fully
connected two-layered feedforward network. However, the summing layer is not \hidden"
as in the case of the multilayered perceptron (MLP), since weights from this layer to the
outputs are xed at 1. This property drastically reduces training time.
Let x = (1; x1; : : : ; x ) be an N + 1-dimensional augmented input column vector where
x denotes the k -th component of x. The inputs are weighted by K weight vectors w =
(w0 ; w1 ; : : : ; w ) ; j = 1; 2; ; K and summed by a layer of K linear \summing" units,
where K is the desired order of the network.
The output of the j -th summing unit, h , is given by:
N
T
k
j
j
j
Nj
T
j
hj
=w x=
T
j
Xw
N
k
kj
=1
xk + w0j ; j
The output y is given by:
y
= (
= 1; 2; : : : ; K:
(2)
Yh )
K
j
=1
(3)
j
where () is a suitable nonlinear activation function. In the above, w is an adjustable
weight from input x to j -th summing unit and w0 is the threshold of the j -th summing
unit. The weights can take arbitrary real values. If a specic input, say x is considered,
then h s, y, and net are also superscripted by p. In this paper, we consider (x) = 1+1?x ;
which corresponds to the Analog Pi-Sigma Model [4].
The network shown in Figure 1 is called a K -th order PSN since K summing units are
incorporated. The total number of adjustable weight connections for a K -th order PSN
with N dimensional inputs is (N + 1) K . If multiple outputs are required, an independent
summingPlayer is needed for each output. Thus, for an M -dimensional output vector y, a
total of =1(N + 1) K adjustable weight connections are needed, where K is the number
of summing units for the i-th output. This allows us great exibility since all outputs do
not have to retain the same complexity. Note that using product units in the output layer
indirectly incorporates the capabilities of higher-order networks with a smaller number of
weights and processing units. This also enables the network to be regular and incrementally
kj
k
j
p
j
e
M
i
i
i
2
expandable, since the order can be increased by one by adding another summing unit and
associated weights, but without disturbing any connection established previously.
The learning rule is based on gradient descent on the estimated mean squared error
surface in weight space, yielding:
w = (t ? y ) (y )0 (
p
p
p
l
Yh )x ;
p
=
j6
(4)
p
j
l
where (y )0 is the rst derivative of sigmoidal function (), that is, (y )0 = 0 () = (1 ?
()) (), x is the (augmented) p-th input pattern and is the learning rate. At each
update step, all K sets of weights are updated but in an asynchronous manner. That is,
one set of weights w = (w0 ; w1 ; : : : ; w ) (corresponding to the j -th summing unit) is
chosen at a time and modied according to the weight update rule. Then, for the same input
pattern, the output is recomputed for the modied network, and the error is used to update
a dierent set of weights. For every input, this procedure is performed K times so that all
K sets of weights are updated once. It can be shown that that procedure is more stable
than the usual scheme where all weights are simultaneously updated. A detailed convergence
analysis of the Pi-sigma learning rule is given in [4].
p
p
p
j
j
j
Nj
T
3. Invariant Pattern Recognition
Practical techniques for recognition of geometric patterns must incorporate some degree of tolerance to noise in the input, and to variations brought about by (small) translation/rotation/scaling of the patterns with respect to the prototypes. One common approach
is to preprocess the input to convert it into another format that is more robust to these
changes. This includes extraction of rotation invariant features derived from complex and
orthogonal Zernike moments of the image [3].
An alternative in the context of neural networks is to handcraft weights of units such
that their response shows little sensitivity to the class of transforms for which invariance is
desired. The latter approach has been taken in [2], where apriori information is used to reduce
the complexity of HPU networks. Often, such apriori knowledge is not available, or the
preprocessing is too computationally expensive. The Pi-sigma network can thus be brought
to bear fruitfully, since it incorporates higher-order correlation and is yet computationally
ecient. To test this hypothesis, we have constructed a database of English and Persian
characters. For each character, there are binary templates for a \standard" exemplar, noisy
versions in which a fraction 0 x 0:4 of the bits are corrupted, and scaled/rotated
variants of these versions. Sample templates are depicted in Fig.2 (a), showing noisy versions
of 'C', and Fig.2 (b) that shows a Persian character and some noisy versions. Half of the
templates are chosen as the training set, and the rest are chosen for testing the classication
and generalization properties of the network. A parallel series of experiments use extensive
cross-validation (jack-knife resampling) to study the eect of training set size on quality of
results.
Each series consist of two sets of experiments. In the rst, each (rst order) feature
vector is augmented by Zernike moments of up to order 5 (14 moments), and fed into 2nd
3
and 3rd order PSNs. In the second set, only the feature vectors are used as inputs, and the
order of PSN is progressively increased by adding extra summing units. This also serves to
test for scaling and generalization.
Preliminary results for both function approximation and classication are extremely encouraging, and show a speedup of about two orders of magnitude over backpropagation for
achieving similar quality of solution. We are currently completing the experiments outlined
above, and also making comparisons with HPU networks.
Concluding Remarks: In this paper, we investigate the nonlinear mapping capabilities
of PSNs, with emphasis on the shift and rotation-invariant pattern recognition. Due to the
ability to form higher-order correlations, we do not need to pre-compute all higher order
moments and then feed them into the network, as was done in [3]. Rather, the network
provides a range of congurations with a trade-o between pre-computation and order of
the network.
The structure of PSNs is highly regular in the sense that summing units can be added
incrementally till an appropriate order of the network is attained without overtting of the
function. This is useful for the invariant pattern recognition problem, since the order can
be gradually increased till the desired level of noise tolerance and invariance capabilities is
reached. Our preliminary results on English and Persian alphabets support these observations.
References
[1] B. Widrow and M. Lehr,\30 Years of Adaptive Neural Networks: Perceptron, Madaline,
and Backpropagation," Proc. IEEE , Vol.78, No.9, pp.1415-1442, Sep. 1990.
[2] C. L. Giles and T. Maxwell,\Learning, Invariance, and Generalization in a High-Order
Neural Network,"Applied Optics , Vol.26, No.23, pp.4972-4978, 1987.
[3] A. Khotanzad and J.H. Lu, \Classication of Invariant Image Representations using a
Neural Network", IEEE Trans. on ASSP, Vol. 38, No. 6, pp. 1028-1039, June 1990.
[4] Y. Shin and J. Ghosh,\Ecient Higher-order Neural Networks for Function Approximation and Classication", IEEE Trans. Neural Networks, In review.
[5] Y. Shin and J. Ghosh,\The Pi-sigma Network: An Ecient Higher-order Neural Network for Pattern Classication and Function Approximation ,"Proceedings of International Joint Conference on Neural Networks , Vol.I, pp.13-18, Seattle, July 1991.
[6] R. O. Duda and P. E. Hart, Pattern Classication and Scene Analysis , John Wiley &
Sons, 1973.
4
.
Figure 2: Examples of character templates and OCR environment.
(a) Noisy versions of \C".
(b) A Persian character and its noisy variants
5