(Return to ICNN97 Homepage) (Return to ICNN'97 Agenda)
ICNN'97 FINAL ABSTRACTS
SU: SUPERVISED / UNSUPERVISED LEARNING
ICNN97 Supervised / Unsupervised Learning Session: SU1A Paper Number: 477 Oral
Global stability analysis of a nonlinear principal component analysis neural network
Anke Meyer-Base
Keywords: Global stability nonlinear principal component analysis neural Hebbian rule
Abstract:
The self--organization of a nonlinear, single--layer neural network is mathematically analyzed, in which a regular Hebbian rule and an anti--Hebbian rule are used for the adaptation of the connection weights between the constituent units. It is shown that the equlibrium points of this system are global asymptotically stable. Following some restrictive assumptions a nonlinear principal component analyzer can be constructed.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU1B Paper Number: 491 Oral
Generalized independent component analysis through unsupervised learning with emergent Bussgang properties
Mark Girolami and Colin Fyfe
Keywords: unsupervised learning exploratory projection pursuit generalized independent component analysis latent probability density functions
Abstract:
We utilise an information theoretic criterion for exploratory projection pursuit (EPP) and have shown that maximisation by natural gradient ascent of the divergence of a multivariate distribution from normality, using the negentropy as a distance measure, yields a generalised independent component analysis (ICA). By considering a Gram-Charlier approximation of the latent probability density functions (PDF) we develop a generalised neuron nonlinearity which can be considered as a conditional mean estimator of the underlying independent components. The unsupervised learning rule developed is shown to asymptotically exhibit the Bussgang property and as such produces output data with independent components, irrespective of whether the independent latent variables are sub-gaussian or super-gaussian. Improved convergence speeds are reported when momentum terms are introduced into the learning.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU1C Paper Number: 602 Oral
An incremental unsupervised learning scheme for function approximation
Christian-A. Bohn
Keywords: unsupervised learning function approximation supervised growing cell structures
Abstract:
A new algorithm for general robust function approximation by an artificial neural network is presented. The basis for this work is Fritzke's supervised growing cell structures approach which combines supervised and unsupervised learning. It is extended by the capability of resampling the function under examination automatically, and by the definition of a new error measure which enables an accurate approximation of arbitrary goal functions.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU1D Paper Number: 191 Oral
Convergence study of principal Component Analysis Algorithms
Chanchal Chatterjee, Vwani P. Roychowdhury and Edwin K. P. Chong
Keywords: Principal component analysis convergence stochastic approximation
Abstract:
We investigate the convergence properties of two different principal component analysis algorithms, and analytically explain some commonly observed experimental results. We use two different methodologies to analyze the two algorithms. The first methodology uses the fact that both algorithms are stochastic approximation procedures. We use the theory of stochastic approximation, in particular the results of Fabian, to analyze the asymptotic mean square errors (AMSEs) of the algorithms. This analysis reveals the conditions under which the algorithms produce smaller AMSEs, and also the conditions under which one algorithm has a smaller AMSE than the other. We next analyze the asymptotic mean errors (AMEs) of the two algorithms in the neighborhood of the solution. This analysis establishes the conditions under which the AMEs of the minor eigenvectors go to zero faster. Furthermore, the analysis makes explicit that increasing the gain parameter up to an upper bound improves the convergence of all eigenvectors. We also show that the AME of one algorithm goes to zero faster than the other. Experiments with multi-dimensional Gaussian data corroborate the analytical findings presented here.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU1E Paper Number: 448 Oral
On the importance of sorting in 'neural gas' training of vector quantizers
Fabio Ancona, Sandro Ridella, Stefano Rovetta and Rodolfo Zunino
Keywords: neural gas model vector quantization sorting
Abstract:
The paper considers the role of the sorting process in the well-known "Neural Gas" model for Vector Quantization. Theoretical derivations and experimental evidence show that complete sorting is not required for effective training, since limiting the sorted list to even a few top units performs effectively. This property has a significant impact on the implementation of the overall neural model at the local level.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU2A Paper Number: 34 Oral
A Novel Algorithm to Configure RBF Networks
Insoo Sohn and Nirwan Ansari
Keywords: RBF networks clustering scatter matrices
Abstract:
The most important factor in configuring an optimum radial basis function (RBF) network is the appropriate selection of the number of neural units in the hidden layer. Competitive learning (CL), frequency sensitive competitive learning (FSCL), and rival penalized competitive learning (RPCL) algorithms have been proposed to train the hidden units of the RBF network, but they suffer from ``dead-units'' and from the problem of not knowing the number of clusters a priori. This paper proposes a novel algorithm called the scattering-based clustering (SBC) algorithm, in which the FSCL algorithm is first applied to let the neural units converge. Scatter matrices of the clustered data are then used to compute the sphericity for each k, where k is the number of clusters. The optimum number of neural units to be used in the hidden layer is then obtained. A comparative study is done between the SBC algorithm and RPCL algorithm, and the result shows that the SBC algorithm outperforms other algorithms such as CL, FSCL, and RPCL.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU2B Paper Number: 456 Oral
Gradient descent learning of radial basis neural networks
Nicolaos B. Karayiannis
Keywords: gradient descent Radial Basis Function neural network supervised learning
Abstract:
_____
ICNN97 Supervised / Unsupervised Learning Session: SU2C Paper Number: 460 Oral
Maximum equalization by entropy maximization and mixture of cumulative distribution functions
Lei Xu, Chi Chiu Cheung, Howard Hua Yang and Shun-ichi Amari
Keywords: Maximum equalization entropy maximization mixture of cumulative distribution functions INFORMAX
Abstract:
New Title: Independent Component Analysis by the Information-theoretic Approach with Mixture of Densities
A novel implementation technique of the Information-theoretic approach to the independent component analysis problem is devised. This new algorithm uses the mixtures of densities as flexible models for the density functions of the source signals and they are tuned adaptively to approximate the marginal densities of the recovered signals. We suggest that the adaptive, flexible models for the density functions have the advantage that it can adapt source signal with any distribution, while a pre-selected, fixed models, which appears as a fixed nonlinearity in the algorithm, may only work on source signals with a particular class of distribution. Experiments have demonstrated the above assertions.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU2D Paper Number: 468 Oral
Dynamics of structural learning with an adaptive forgetting rate
Damon A. Miller and Jacek M. Zurada
Keywords: structural learning FEEDFORWARD NEURAL NETWORK REGULARIZATION forgetting rate
Abstract:
Keywords: neural networks, structural learning, forgetting, complexity reduction, pruning, dynamical systems
Structural learning with forgetting is a prominent method of multilayer feedforward neural network complexity regularization. The level of regularization is controlled by a parameter known as the forgetting rate. The goal of this paper is to establish a dynamical system framework for the study of structural learning both to offer new insights into this methodology and to potentially provide a means of either developing new or analytically justifying existing forgetting rate adaptation strategies. The resulting nonlinear model of structural learning is analyzed by developing a general linearized equation for the case of a quadratic error function. This analysis demonstrates the effectiveness of an adaptive forgetting rate. A simple example is provided to illustrate our approach.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU2E Paper Number: 286 Oral
MDL Regularizer: A new regularizer based on the MDL principle
Kazumi Saito and Ryohei Nakano
Keywords: regularization Minimum Description Length maximum likelihood weight vector regression
Abstract:
This paper proposes a new regularization method based on the MDL (Minimum Description Length) principle. An adequate precision weight vector is trained by approximately truncating the maximum likelihood weight vector. The main advantage of the proposed regularizer over existing ones is that it automatically determines a regularization factor without assuming any specific prior distribution with respect to the weight values. Our experiments using a regression problem showed that the MDL regularizer significantly improves the generalization error of a second-order learning algorithm and shows a comparable generalization performance to the best tuned weight-decay regularizer.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU2F Paper Number: 607 Oral
Additional learning and forgetting for pattern classification
Hirotaka Nakayama and Masahiko Yoshida
Keywords: pattern classification RBF back propagation
Abstract:
In analogy to growth of human beings, machine learning should make additional learning if some new knowledge is obtained. This situation occurs very often in practical problems in which the environment changes over time, e.g., in financial investment problems. The well known back propagation method in artificial neural networks is not effective for such additional learning. On the other hand, the potential method, which was suggested by the authors recently, can make additional learning very easily.
In this paper, the effectiveness of the potential method is shwon from a viewpoint of additional learning.
Furthermore, since the rule for classification becomes more and more complex with only additional learning, some appropriate forgetting is also necessary. Examples in stock portfolio problems show that the performance of potential method increases more with additional learning and forgetting.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU3A Paper Number: 612 Oral
Learning performance measures for MLP networks
Peter Geczy and Shiro Usui
Keywords: performance measures MLP networks training
Abstract:
Training of MLP networks is mainly based on implementation of first order line search optimization techniques. Determination of the search direction is given by an error matrix for a neural network. The error matrix contains essential information not only about the search direction, but also about the specific features of the error landscape. The analysis of the error matrix based on the estimation of its spectral radius provides relative measure on proportion of the algorithm's movement in multidimensional weight/error space. Furthermore, the estimate of a spectral radius forms a suitable reference ground for derivation of performance measures. The article presents effective and computationally inexpensive performance measures for MLP networks. Such measures allow not only monitoring of network's performance but on their basis an individual performance measure for each structural element can be derived. This has direct applicability to the pruning strategies of MLP networks. In addition, the proposed performance measures permit detection of specific shapes of error surfaces such as flat regions and sharp slopes. This feature is of essential importance for algorithms implementing dynamic modifications of learning rate.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU3B Paper Number: 168 Oral
On the structure of the Hessian Matrix in Feedforward networks and Second Derivative Methods
Jorg Wille
Keywords: backpropagation learning feedforward networks adaptive first derivative algorithm
Abstract:
_____
ICNN97 Supervised / Unsupervised Learning Session: SU3C Paper Number: 196 Oral
Generalization of the cross-Entropy Error Function to Improve the Error Backpropagation Algorithm
Sang-Hoon Oh
Keywords: cross-entropy error function error backpropagation algorithm handwritten digit recognition
Abstract:
_____
ICNN97 Supervised / Unsupervised Learning Session: SU3D Paper Number: 613 Oral
Effects of structural adjustments on the estimate of spectral radius of error matrices
Peter Geczy and Shiro Usui
Keywords: spectral radius of error matrices MLP networks structural adjustments
Abstract:
Successful practical applications of MLP networks trained by first order line search optimization approaches initiated a wave of interest in improving the speed of convergence and finding the optimum structure. Recently, training procedures have been implemented that incorporate dynamic structural changes of a network during a learning phase. The objective is to optimize the size of a network and yet maintain good performance. Essential part of structure modifying algorithms is the formulation of criteria for detecting the irrelevant structural elements. Performance measures for MLP networks designed by the authors in (Geczy (1996)) allow the derivation of the individual performance measure I. The underlying reference ground for the individual performance measure I is the estimate of a spectral radius of an error matrix for a neural network. The estimate of a spectral radius can be affected by imposed modifications of structure.
This paper presents the first deterministic analytical apparatus for observing the effects of structural modifications on the estimate of a spectral radius. Theoretical material has wide applicability. However, it is especially useful for developing training procedures incorporating the dynamic structural changes and also for monitoring the performance of MLP networks.
_____
ICNN97 Learning & Memory Session: SU3E Paper Number: 109 Oral
Radial Basis Function networks for autonomous agent control
Ralf Salomon
Keywords: radial basis function network autonomous agent control incremental learning
Abstract:
While many learning algorithms as well as dynamic growing and pruning techniques are appropriate for most technical applications, they do not work appropriately in the context of autonomous agents. Autonomous agents potentially operate in dynamically changing environments. They receive an endless data stream, which makes it impossible to store a fixed set of training patterns. Therefore, autonomous agents require network models that, among other properties, feature incremental learning.
This paper shows how radial basis function networks can be modified to fit these requirements. Since we are currently developing an appropriate value system for autonomous agents, this paper illustrates the network's properties on several regression tasks and the well-know double-spiral problem. It is shown that (1) the network yields fast convergence, (2) the presentation of patterns from one subspace does not affect the mapping of other patterns, (3) and the model yields very fast classification; the network learns the double-spiral task within only one epoch.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU4A Paper Number: 479 Oral
A new local linearized least squares algorithm for training feedforward neural networks
Octavian Stan and Edward W. Kamen
Keywords: feedforward neural networks local linearized least squares global extended Kalman filter
Abstract:
An algorithm used to train the weights of a feedforward neural network is the Global Extended Kalman Filter (GEKF) algorithm, which has much better performance than the popular gradient descent with error backpropagation in terms of convergence and quality of solution. However, the GEKF is very computationally intensive, and this has led to the development of simplified algorithms based on the partitioning of the global nonlinear optimization problem into a set of local nonlinear problems at the neuron level. In this paper a new training algorithm is developed by viewing the local subproblems as recursive linearized least squares problems. The objective function of the least squares problems for each neuron is the sum of the squares of the linearized back propagated error signals. The new algorithm is shown to give better convergence results for two benchmark problems in comparison to existing local algorithms.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU4B Paper Number: 583 Oral
Extensions and enhancements of decoupled extended Kalman filter training
G. V. Puskorius and L. A. Feldkamp
Keywords: Decoupled extended Kalman filter cost functions pattern classification relative entropy
Abstract:
We describe here three useful and practical extensions and enhancements of the decoupled extended Kalman filter (DEKF) neural network weight update procedure, which has served as the backbone for much of our applications-oriented research for the last six years.
First, we provide a mechanism that constrains weight values to a pre-specified range during training to allow for fixed-point deployment of trained networks. Second, we examine modifications of DEKF training for alternative cost functions; as an example, we show how to use DEKF training to minimize a measure of relative entropy, rather than mean squared error, for pattern classification problems.
Third, we describe an approximation of DEKF training that allows a multiple-output training problem to be treated with single-output training complexity.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU4C Paper Number: 600 Oral
Signal-flow-graph derivation of on-line gradient learning algorithms
Paolo Campolucci, Andrea Marchegiani, Aurelio Uncini, Francesco Piazza
Keywords: signal-flow-graph on line gradient learning cost function feedforward network
Abstract:
In this paper, making use of the Signal-Flow-Graph (SFG) representation and its known properties, we derive a new general method for backward gradient computation of a system output or cost function with respect to past (or present) system parameters. The system can be any causal, in general non-linear and time-variant, dynamic system represented by a SFG, in particular any feedforward or recurrent neural network. In this work we use discrete time notation, but the same theory holds for the continuos time case. The gradient is obtained by the analysis of two SFGs, the original one and its adjoint. This method can be used both for on-line and off-line learning. In the latter case using the Mean Square Error cost function, our approach particularises to E. Wan's method that is not suited for on-line training of recurrent networks. Computer simulations of non-linear dynamic systems identification will also be presented to assess the performance of the algorithm resulting from the application of the proposed method in the case of locally recurrent neural networks.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU4D Paper Number: 425 Oral
A hybrid global learning algorithm based on global search and least squares techniques for backpropagation networks
Chi-Tat Leung and Tommy W.S. Chow
Keywords: backpropagation global search least squares nonlinear function approximation
Abstract:
A hybrid learning algorithm for backpropagation network based on global search and least squares methods is presented to speed up the speed of convergence. The proposed algorithm comprises global search and least squares parts. The global search part trains a backpropagation network over a reduced weight space. The remained weights are calculated in accordance with linear least squares method. Two problems of nonlinear function approximation and modified XOR are applied to demonstrate the fast global search performance of the proposed algorithm. The results indicate that the proposed algorithm enables the learning process to significantly speed up by at most 4670 % in terms of iterations and do not trap in local minima.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU4E Paper Number: 187 Oral
The Effects of Quantization on the Backpropagation Learning
Kazushi Ikeda, Akihiro Suzuki and Kenji Nakayama
Keywords: parameter quantization backpropagation learning learning coefficient
Abstract:
The effects of the quantization of the parameters of a learning machine are discussed. The learning coefficient should be as small as possible for a better estimate of parameters. On the other hand, when the parameters are quantized, it should be relatively larger in order to avoid the paralysis of learning originated from the quantization. How to choose the learning coefficient is given in this paper from the statistical point of view.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU4F Paper Number: 226 Oral
A construction method of feedforward neural network for selecting effective hidden nodes
Hui Su, Bin Zhao and Shaowei Xia
Keywords: Construction method for selecting effective h feedforward neural network Generalization ability
Abstract:
_____
ICNN97 Supervised / Unsupervised Learning Session: SU5A Paper Number: 115 Oral
Robust Training Algorithm for a Perceptron Neuron
Q. Song
Keywords: nonlinear perceptron bounded disturbance robust classifier dead zone projection algorithm
Abstract:
Our interesting in this paper is to study the behavior of the perceptron neuron in the presence of disturbance which is always important for practical applications. A robust classifier is required to be no sensitive to the disturbance and classify noised input patterns into the correct class, to which the respective desired input pattern belongs. The projection algorithm with a dead zone is well known in system identification and adaptive control systems to guarantee the convergence. In this paper, the dead zone scheme is used to train the nonlinear perceptron neuron. The trained perceptron neuron is capable of classifying the noisy input pattern sequence into the correct class in the presence of disturbance.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU5B Paper Number: 148 Oral
Synaptic Delay Based Artificial Neural Networks and Discrete Time Backpropagation Applied to QRS Complex Detection
R.J. Duro and J. Santos
Keywords: back propagation algorithm discrete time feedforward networks internal time delays
Abstract:
In this paper we make use of an extension of the back-propagation algorithm to discrete time feed forward networks that include internal time delays in the synapses. The structure of the network is similar to the one presented by [1], that is, in addition to the weights of the synaptic connections, we model their length through a parameter that indicates the delay a discrete event suffers when going from the origin neuron to the target neuron through a synaptic connection. Like the weights, these delays are also trainable, and a training algorithm can be obtained that is almost as simple as the Backpropagation algorithm [2], and which is really an extension of it. We present an application of these networks to the task of identifying normal QRS and ventricular QRS complexes in an ECG signal with the network receiving the signal sequentially, that is, no windowing or segmentation is applied.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU5C Paper Number: 651 Oral
A geometric learning algorithm for elementary perceptron and its convergence analysis
Seiji Miyoshi and Kenji Nakayama
Keywords: geometric learning Affine projection algorithm convergence adaptive filters
Abstract:
In this paper, the geometric learning algorithm (GLA) is proposed for an elementary perceptron which includes a single output neuron. The GLA is a modified version of the affine projection algorithm (APA) for adaptive filters.
The weights update vector is determined geometrically towards the intersection of the k hyperplanes which are perpendicular to patterns to be classified. k is the order of the GLA. In the case of the APA, the target of the coefficients update is a single point which corresponds to the best identification of the unknown system. On the other hand, in the case of the GLA, the target of the weights update is an area, in which all the given patterns are classified correctly. Thus, their convergence conditions are different. In this paper, the convergence condition of the 1st order GLA for 2 patterns is theoretically derived. The new concept ``the angle of the solution area" is introduced. The computer simulation results support that this new concept is a good estimation of the convergence properties.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU5D Paper Number: 320 Oral
Superior training of artificial neural networks using weight-space partitionning
Hoshin V. Gupta, Kuo-Lin Hsu and Soroosh Sorooshian
Keywords: batch training feedforward neural network multi-start downhill simplex conditional least squares
Abstract:
LLSSIM (Linear Least Squares SIMplex) is a new algorithm for batch training of three-layer feedforward Artificial Neural Networks (ANN), based on a partitioning of the weight space. The input-hidden weights are trained using a "Multi-Start Downhill Simplex" global search algorithm, and the hidden-output weights are estimated using "conditional linear least squares". Monte-carlo testing shows that LLSSIM provides globally superior weight estimates with significantly fewer function evaluations than the conventional back propagation, adaptive back propagation, and conjugate gradient strategies.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU5E Paper Number: 628 Oral
MUpstart - A constructive neural network learning algorithm for multi-category pattern classification
Rajesh Parekh, Jihoon Yang, and Vasant Honavar
Keywords: constructive learning multi-category pattern classification upstart algorithm
Abstract:
Constructive learning algorithms offer an approach for dynamically constructing near-minimal neural network architectures for pattern classification tasks. Several such algorithms proposed in the literature are shown to converge to zero classification errors on finite non-contradictory datasets. However, these algorithms are restricted to two-category pattern classification and (in most cases) they require the input patterns to have binary (or bipolar) valued attributes only. We present a provably correct extension of the {\em Upstart} algorithm to handle multiple output classes and real-valued pattern attributes. Results of experiments with several artificial and real-world datasets demonstrate the feasibility of this approach in practical pattern classification tasks and also suggest several interesting directions for future research.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU5F Paper Number: 574 Oral
Gauss-Newton approximation to Bayesian Learning
F. Dan Foresee and Martin T. Hagan
Keywords: Gauss-Newton approximation Bayesian regularization feedforward neural network
Abstract:
This paper describes the application of Bayesian regularization to the training of feedforward neural networks. A Gauss-Newton approximation to the Hessian matrix, which can be conveniently implemented within the framework of the Levenberg-Marquardt algorithm, is used to reduce the computational overhead. The resulting algorithm is demonstrated on a simple test problem and is then applied to three practical problems.
The results demonstrate that the algorithm produces networks which have excellent generalization capabilities.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU6A Paper Number: 145 Oral
The Weighted EM Algorithm and Block Monitoring
Yasuo Matsuyama
Keywords: EM algoritm weighted EM algorithm generalized divergence
Abstract:
The expectation and maximization algorithm (EM algorithm) is generalized so that the learning proceeds according to adjustable weights in terms of probability measures. The presented method, the weighted EM algorithm, or the $\alpha$-EM algorithm includes the existing EM algorithm as a special case. It is further found that this learning structure can work systolically. It is also possible to add monitors to interact with lower systolic subsystems. This is made possible by attaching building blocks of the weighted (or plain) EM learning. Derivation of the whole algorithm is based on generalized divergences. In addition to the discussions on the learning, extensions of basic statistical properties such as Fisher's efficient score, his measure of information and Cram\'er-Rao's inequality are given. These appear in update equations of the generalized expectation learning.
Experiments show that the presented generalized version contains cases that outperform traditional learning methods.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU6B Paper Number: 462 Oral
From Bayesian-Kullback Ying-Yang Learning to Bayesian Ying-Yang system: New advances
Lei Xu
Keywords: Bayesian-Kullback Ying-Yang Learning Bayesian Ying-Yang system principal component analysis
Abstract:
New Title: New Advances on Bayesian Ying-Yang Learning System With Kullback and Non-Kullback Separation Functionals
In this paper, we extend Bayesian-Kullback YING-YANG (BKYY) learning into a much broader Bayesian Ying-Yang (BYY) learning System via using different separation functionals instead of using only Kullback Divergence, and elaborate the power of BYY learning as a general learning theory for parameter learning, scale selection, structure evaluation, regularization and sampling design, with its relations to several existing learning methods and its developments in the past years briefly summarized. Then, we present several new results on BYY learning. First, improved criteria are proposed for selecting number of densities on finite mixture and gaussian mixtures, for selecting number of clusters in MSE clustering and for selecting subspace dimension in PCA related methods. Second, improved criteria are proposed for selecting number of expert nets in mixture of experts and its alternative model and selecting number of basis function! in RBF nets. Third, three ca ... (incomplete abstract received)
_____
ICNN97 Supervised / Unsupervised Learning Session: SU6C Paper Number: 129 Oral
D-Entropy Controller for Interpretation and Generalization
Ryotaro Kamimura
Keywords: D-Entropy generalization internal representation
Abstract:
In this paper, we propose a method to control D-entropy for better generalization and explicit interpretation of internal representations. By controlling D-entropy, a few hidden units are detected as important units without saturation. In addition, a small number of important input-hidden connections are detected and the majority of the connections are eliminated. Thus, we can obtain much simplified internal representations with better interpretation and generalization. D-entropy control method was applied to the inference of well-formedness of an artificial language.
Experimental results confirmed that by maximizing and minimizing D-entropy, generalization performance can significantly be improved.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU6D Paper Number: 652 Oral
Avoiding weight-illgrowth: Cascade correlation algorithm with local regularization
Qun Xu and Kenji Nakayama
Keywords: cascade correlation zigzag output mapping weight-illgrowth regression
Abstract:
Keywords: neural network, dynamic learning algorithm, cascade correlation, generialization, regularization
This paper investigates some possible problems of Cascade Correlation algorithm, one of which is the zigzag output mapping caused by weight-illgrowth of the adding hidden unit. Without doubt, it could lead to deteriorate the generialization, especially for regression problems. To solve this problem, we combine Cascade Correlation algorithm with regularization theory. In addition, some new regularization terms are proposed in light of special cascade structure. Simulation has shown that regularization indeed smooth the zigzag output, so that the generialization is improved, especially for functional approximation.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU6E Paper Number: 629 Oral
Pruning strategies for constructive neural network learning algorithms
Rajesh Parekh, Jihoon Yang, and Vasant Honavar
Keywords: constructive learning pruning strategies generalization
Abstract:
We present a framework for incorporating pruning strategies in the MTiling constructive neural network learning algorithm.
Pruning involves elimination of redundant elements (connection weights or neurons) from a network and is of considerable practical interest. We describe three elementary sensitivity based strategies for pruning neurons. Experimental results demonstrate a moderate to significant reduction in the network size without compromising the network's generalization performance.
_____
ICNN97 Supervised / Unsupervised Learning Session: SU6F Paper Number: 92 Oral
Self-Regulation of Model Order in Feedforward Neural Networks
Ravi Kothari and Kwabena Agyepong
Keywords: feedforward neural networks number of hidden layers model order
Abstract:
Despite the presence of theoretical results, the application of feed-forward neural networks is hampered by the lack of systematic procedural methods for det ermining the number of hidden neurons to use. The number of hidden layer neurons determine the order of the neural network model and consequently the generalization performance of the network. This paper puts into perspective the approaches used to address this problem and presents a new paradigm which uses dependent evolution of hidden layer neurons to self-regulate the model order. We show through simulations that despite an abundance of free-parameters (i.e.
starting with a larger than necessary network), the proposed paradigm allows for localization of specializing hidden layer neurons with the unspecialized hidden layer neurons behaving similarly. These similarly behaving neurons reduce the model order and allow for the benefits of a smaller sized network. Hints on analytically understanding the behavior are also noted.
_____
ICNN97 Supervised / Unsupervised Learning Session: SUP2 Paper Number: 156 Poster
Properties of learning of a fuzzy ART Variant
Michael Georgiopoulos, Issam Dagher, Gregory L. Heileman, and George Bebis
Keywords: fuzzy ART algorithm choice parameter clustering
Abstract:
This paper discusses one variation of the Fuzzy ART architecture, referred to as Fuzzy ART Variant. The Fuzzy ART variant is a Fuzzy ART algorithm, with a very large value for the choice parameter. Based on the geometrical interpretation of templates in Fuzzy ART we present and prove useful properties of learning pertaining to the Fuzzy ART variant. One of these properties of learning establishes an upper bound on the number of list presentations required by the Fuzzy ART variant to learn an arbitrary list of input patterns presented to it. In previously published work, it was shown that the Fuzzy ART variant performs as well as a Fuzzy ART algorithm with more typical values for the choice parameter. Hence, the Fuzzy ART variant is as a good of a clustering machine as the Fuzzy ART algorithm using more typical values of the choice parameter.
_____
ICNN97 Supervised / Unsupervised Learning Session: SUP2 Paper Number: 157 Poster
A novel Fuzzy Based Algorithm for Radial Basis Function Neural Network
M.V.C. Rao and A.V. Kishore
Keywords: LMS algorithm RBF neural networks nonlinear system identification
Abstract:
An endeavour is made to propose a novel fuzzy variation of the learning parameter in order to improve the performance of the Least Mean square (LMS) algorithm in connection with the Radial Basis Function Neural Networks (RBFNN).The error and the change in error are the parameters based on which the learning parameter is modified using the fuzzy principles.The learning parameter is allowed to change in a fixed range only.This modified algorithm is used for a non-linear system identification problem.It is clearly shown by test examples that the performance of this new method is not only better than the LMS algorithm but also better than the Normalised Least Mean Square(NLMS) algorithm which is certainly superior to LMS.Further this does not require extra computation which is an unavoidable feature of NLMS.The convergence during learning is almost similar to that of NLMS.Thus it has been clearly established that this algorithm is most certainly superior to the existing ones in terms of the most desirable characteristics ofcapturing the plant characteristics and lower computational requirements.
_____
ICNN97 Supervised / Unsupervised Learning Session: SUP2 Paper Number: 160 Poster
Experiments with Learning Rules for a Single Neuron
Mitra Basu
Keywords: perceptron convergence non-linear methods linearly non-separable data
Abstract:
In this paper we investigate a variation of Perceptron-like learning rules for single neural unit.The existing learning rules lack one important element: if the patterns are not linearly separable, the rule either does not converge or converges to an approximate solution.
So, the deficiency is that one can not draw any conclusion as to the nature of the problem (linear or non-linear). We propose to design a class of dual purpose rules such that (1) if the patterns are linearly separable, the performance of a rule will be equivalent to that of the Perceptron rule and (2) if the patterns are not linearly separable then the rule will indicate to that effect and therefore, appropriate non-linear methods (e.g., multi-layer neural network, nonlinear transformation on input-space etc.) can be used to address this problem.
We present experimental results with linearly-separable as well as linearly non-separable data using the proposed rule and compare its performance with that of the Perceptron rule.
_____
ICNN97 Supervised / Unsupervised Learning Session: SUP2 Paper Number: 55 Poster
Training of supervised neural networks via a nonlinear primal-dual interior-point method
Theodore B. Trafalis, Nicolas P. Couellan and Sebastien C. Bertrand
Keywords: feedforward supervised neural networks primal-dual interior-point method nonlinear programming approximation
Abstract:
We propose a new training algorithm for feedforward supervised neural networks based on a primal-dual interior-point method for nonlinear programming. Specifically, we consider a one-hidden layer network architecture where the error function is defined by the L2 norm and the activation function of the hidden and output neurons is nonlinear. Computational results are given for odd parity problems with 2, 3, and 5 inputs respectively. Approximation of a nonlinear dynamical system is also discussed.
_____
ICNN97 Supervised / Unsupervised Learning Session: SUP2 Paper Number: 199 Poster
Blind separation of binary sources with less sensors than sources
Petteri Pajunen
Keywords: statistical signal processing unsupervised neural learning source separation constrained competitive learning
Abstract:
Blind separation of unknown sources from their mixtures is currently a timely research topic in statistical signal processing and unsupervised neural learning. Several source separation algorithms have been presented where it is assumed that there are at least as many sensors as sources. In this paper, a practical algorithm is proposed for separating binary sources from less sensors than sources. The algorithm uses constrained competitive learning in the adaptation phase and the actual separation is achieved by simply selecting the best matching unit. The algorithm appears to be reasonably robust against small additive noise.
_____
ICNN97 Supervised / Unsupervised Learning Session: SUP2 Paper Number: 504 Poster
Back-propagation of accuracy
Masha Yu. Senashova, Alexander N. Gorban and Donald C. Wunsch II
Keywords: back propagation accuracy
Abstract:
In this paper we solve the problem: how to determine maximal allowable errors, possible for signals and parameters of each element of a network, proceeding from the condition that the vector of output signals of the network should be calculated with given accuracy?
"Back-propagation of accuracy" is developed to solve this problem.
_____
ICNN97 Supervised / Unsupervised Learning Session: SUP2 Paper Number: 247 Poster
Selection of the convergence coefficient using automata learning rule
N.O. Ezzati and K. Faez
Keywords: convergence coefficient stochastic automata learning back propagation learning nonlinear function approximation
Abstract:
In this paper a novel approach for selection of convergence coefficient in backpropagation learning rule is presented. This approach uses stochastic automata learning rule for selection of the best coefficient in each step of learning phase. This approach is applied to nonlinear function approximation problem and the results of simulation gives faster convergence than the conventional and adaptive learning rate backpropagation rule.
_____
ICNN97 Supervised / Unsupervised Learning Session: SUP2 Paper Number: 265 Poster
A position paper on statistical inference techniques which intregrate neural network and bayesian network models
William H. Hsu
Keywords: Statistical inference bayesian neural network Gibbs sampler Markov chain Monte Carlo methods
Abstract:
Some statistical methods which have been shown to have direct neural network analogs are surveyed here; we discuss sampling, optimization, and representation methods which make them feasible when applied in conjunction with, or in place of, neural networks. We present the foremost of these, the Gibbs sampler, both in its successful role as a convergence heuristic derived from statistical physics and under its probabilistic learning interpretation. We then review various manifestations of Gibbs sampling in Bayesian learning; its relation to ``traditional'' simulated annealing; specializations and instances such as EM; and its application as a model construction technique for the Bayesian network formalism. Next, we examine the ramifications of recent advances in Markov chain Monte Carlo methods for learning by backpropagation. Finally, we consider how the Bayesian network formalism informs the causal reasoning interpretation of some neural networks, and how it prescribes optimizations for efficient random sampling in Bayesian learning applications.
Keywords: Bayesian networks, supervised and unsupervised learning, simulated annealing
_____
ICNN97 Supervised / Unsupervised Learning Session: SUP2 Paper Number: 393 Poster
Develpment of a neural network algorithm for unsupervised competitive learning
Dong C. Park
Keywords: unsupervised competitive learning clustering K-Means image compression
Abstract:
_____
ICNN97 Supervised / Unsupervised Learning Session: SUP2 Paper Number: 404 Poster
Difficulty in learning vs. network size
Goutam Chakraborty and Shoichi Noguchi
Keywords: Representation capability generalization difficulty in learning
Abstract:
_____
ICNN97 Supervised / Unsupervised Learning Session: SUP2 Paper Number: 208 Poster
Multiple categorization using fuzzy ART
Pierre Lavoie, Jean-Francois Crespo and Yvon Savaria
Keywords: Fuzzy ART internal competition tuning parameter
Abstract:
The internal competition between categories in the Fuzzy Adaptive Resonance Theory (ART) neural model can be biased by replacing the original choice function with one that contains a tuning parameter under external control. The competition can be biased, so that, for example, categories of a desired size can be favored. This attentional tuning mechanism allows recalling for a same input different categories under different circumstances, even when no additional learning takes place. A new tuning parameter is unnecessary, since the readily available vigilance parameter can control both attentional tuning and vigilance. The modified \FART\ has the self-stabilization property for analog inputs, whether vigilance is fixed or variable.