[IEEE 1990 IJCNN International Joint Conference on Neural Networks - San Diego, CA, USA (1990.06.17-1990.06.21)] 1990 IJCNN International Joint Conference on Neural Networks - Pulse stream neural networks and reinforcement learning

Pulse-S tream Neural Networks and Reinforcement Learning Krister Valtonen, Torbjorn Kronander, and Ingemar Ingemarsson Department of Electrical Engineering, Linkoping University, S-58183 Linkoping, Sweden e-mail: [email protected], [email protected] Abstract In this paper we present a neural network model based on pulsed signals. Pulsed signals are shown to give the model many pleasant properties, such as a simplified hardware implementation, possibility to use stochastic search techniques and biological plausibility. In our model we also have the possibility to build modular type of neural systems, which may turn up to be the cure for the curse of dimensionality. Since information representation in pulsed signals has to be studied carefully, we outline the stochastic representation and show how generation of a signal and estimation of information is obtained. Several design issues have to be solved before it is possible to use a model of this kind, and we propose solutions for synaptic multiplication, summation and nonlinearity. It is also shown that reinforcement learning is the type of learning which is best suited for this type of model and that this does not interfere with the properties above. Initial simulations of our model show promising results. 1 Introduction Most work on artificial neural networks (ANNs) has up t o date dealt with neurons using graded signals (scalar values). Graded signals are used to represent the average spike frequency in biological spike trains. Instead of doing this transformation from pulsed to graded signals we work directly with pulsed signals and claim that there is much to be gained by using pulsed signals. First, and most important, the use of pulsed signals can simplify hardware implementation by stochastic processing [Rib67]. This is a technique which represents values by probabilities. These probabilities are then represented by pulse trains. Stochastic processing has shown to give simple components for many important operations. The use of stochastic processing for ANNs has been proposed before [NH87] [Gai87] [RGK87] [LM89]. Also, the design of ANNs with pulsed signals without stochastic processing is under current work by Murray and colleagues [MHRT89] [Mur89]. As pointed out by Leaver and Mars [LM89], powerful implementations can be achieved in mixed electro/optical designs based on photon interaction. It should be pointed out that hardware issues are very important. It is, for example, shown by Lee and Lippmann [LL90] that none of ANNs and conventional pattern classifiers are superior to the other when simulated on serial computer. The real potential of ANNs is in its massive parallelism, and this fact is not taken advantage of by simulating the ANNs on serial computers. ANNs using pulsed signals also have a strong relationship to different kind of stochastic search techniques such as Boltzmann machines and simulated annealing. Stochastic search has properties which makes them superior in avoiding local minima. Several ANN models have been designed using stochastic search [AAHS87] [AIIS83] [LA871 PdBM881. ANNs that use pulsed signals are also very interesting from a biological point of view, since this is what biology itself uses. Issues from neurabiology are much easier studied if the ANN model uses pulsed signal I1 - 267 representation. One may of course ask if pulsed signals are essential for ANN design, and the answer to this question is, to the knowledge of us presently, not known. Sejanowski [SejSS] provide considerable discussion on the matter. As learning is concerned we adopt the reinforcement type of learning [Sut84] [Wi187]. In reinforcement learning each artificial neuron receive a âcriticâ, a âgoodâ or âbadâ value. This is an unsupervised type of learning, in contrast t o e.g. back propagation, which is supervised (in the sense that the system is told what the desired output is). We implement the reinforcement signal in the same way as the signal used to transmit information between artificial neurons. This gives the possibility to use the output from one neuron, or one module, t o act as reinforcement to another neuron, or module. So potentially our model could be used to build modular or hierarchical type of artificial neural systems. This may be a cure for the curse of dimensionality, since modularity is the standard way to handle complexity of large problems, i.e. problems of large dimensionality. We will below first discuss the implementation of our signal. Then we outline both construction of our artificial neuron and the use of reinforcement learning in this framework. We conclude with a discussion on hardware implementation, biological issues and future work. 2 Pulsed Signals Intuition says that a pulsed signal represents some kind of frequency (the more pulses per time unit, the higher value is represented). In the case where there is some maximum frequency of the signal (for example if the pulses are not instantaneous), it makes sense to study the relative frequency of pulses. And from relative frequencies, the step to probabilities is short. In stochastic processing usually a value, p (0 5 p 5 l), is represented by a synchronous pulse train, where the pulse probability at each discrete time unit is p . Instead of synchronous pulse trains we use asynchronous with constant pulse width, b. The time intervals between the beginning of one pulse to the beginning of the next (the ISI, Interspike Interval [ T u c ~ ~ ] ) are random, exponentially distributed with refractory period b. The type of pulse may be chosen later (as long as it has constant width) and this choice will not affect the amount of information carried. It could for example be constant or some model of the nerve impulse. In figure 1 we show an example of a signal with constant random height, uniformly distributed between hmin and h,,,. Since we are dealing with asynchronous pulsed signals we have to be more careful with the representation than in the synchronous case. We say that a pulse train represents the probability of a pulse at any time point. Given a pulse train, X ( t ) , of time length T , the maximum likelihood estimate of this probability, p , is to+T Stzto I ( X ( t ) ) d t total time of pulses N . b -- - - p = - to+T 1 dt total time length T S A o I1 - 268 where N is the actual number of pulses which occurred in the pulse train and I is the pulse-indicator function, 1 0 otherwise if x is a pulse, I(x) = It is interesting to note that the same estimate will be achieved if we sample the pulse train with sampling time b. This sampling is a perfect trap for pulses since each pulse is observed at one and only one sampling time (they can not escape and they can not be observed in two samples). By the sampling we therefore obtain a pulse counting, and we have number of samples with pulse - N - N . b p = --- total number of samples - T / b T The generation of a pulse train to have a certain p is only dependent of X in equation 1, and b. The relations between p and X are Until this point we have assumed that a constant value is represented. But this is most often not the case. And, of course, if we want a dynamic system, the investigation above is not enough. Also, relations 3 do not hold exactly if we replace p with p(t) and X with X ( t ) . (Consider for example X(t) = 1 for t < 0 and X ( t ) = 0 for t 2 0, then the probability for a pulse at t = b / 2 is in fact # 0 = X(b/2) . b / ( l + X(b/2) . b).) Although, we have by careful simulations found that are good approximations. The lesser variation of the functions, the better approximation. Given p(t), generation of the pulse train is obtained by first calculating X ( t ) according to approximation 4 and then generating the pulses in the same way events in an inhomogeneous Poisson process can be generated by time scaling of a homogeneous Poisson process [Sny75]. Estimation is not trivial when the pulse probability varies in time. It could be achieved by designing optimal filters as done by Bialek and colleagues [BRWSSO] or by approximating the pulse train with a doubly stochastic Poisson process and using results from the statistics literature [Sny75]. We have used estimation as done in equation 2 but giving each sample exponentially lesser importance the earlier i t was observed. This gives, with appropriate parameters (chosen to give the right trade off between speed and accuracy), good estimation. We must remember that our research aims for easy hardware implementation and that complex (but good) estimation easily gets unimplementable. Our estimation implements easy with nonideal capacitors. We will in section 3 show that we need one estimator per neuron. 3 Construction of the Artificial Neuron The construction of our neuron bears much resemblance with traditional artificial neurons and is described in figure 2a. At the synapses we implement a multiplication by letting a pulse pass according to a probability (the weight). This is implemented by letting the pulses have random height and in doing a simple thresholding (all pulses shorter than the magnitude of the weight are passed through) at T in figure 2a. Let the pulses have constant random, uniformly distributed height, h, 0 < h 5 1, and let the synaptic weight, w, be 0 5 w 5 1. The probability for a pulse after the threshold, pafter, is then pafter = P[(pulse before) and (pulse passed)] = pbefore P[pulse shorter than W] = pbefore . w a\ , Y . I â I -____-- ---I Figure 2: a) The proposed neuron. b) Why reconstruction is needed. The unit f in figure 2a does what in a traditional neuron is accomplished by summation and nonlinearity. It works on simple âfirst come, first servedâ basis, meaning that a pulse is passed if there is no one already passing. Alternative functions of the f-unit can be discussed, but this is one that gives an output signal of the same type as the input signals. Inhibitory synapses do not let their pulses through the f-unit, but still prohibit pulses from other synapses to pass the f-unit. A synapse is inhibitory if its weight is negative. The mathematical function accomplished by the f-unit is complex, and is not a function of a sum. But it can be proved that by increasing any exitatory input probability the output probability is increased (and is decreased for increased ihibitory input probability). This is a property that, for the time, is satisfying. In order for the f-unit t o work properly the inputs have to be independent. If we do not reconstruct the output, a pulse which arrives to a neuron passes through it, if it survives both the threshold and the f-unit. It may therefore be possible for one pulse from one neuron to turn up on two different inputs on another neuron if there is an intermediate layer, as shown in figure 2b. The inputs are then dependent. By reconstruction of a pulse train we have the desired independence. Reconstruction can be done with estimation followed by generation, as in figure 2a. This causes time delays and our networks are thus truly dynamic. 4 Reinforcement Learning In reinforcement learning each neuron receives a reinforcement signal, r ( t ) , telling it how well it is doing. Usually the signal is binary, having a âgoodâ and âbadâ value. Ideally the neuron should act to one input with the action that has the greatest probability for giving a âgoodâ reinforcement value. Reinforcement learning also relates to stochastic learning automata [BA851 and stochastic dynamic programming [BSWSO]. Reinforcement learning involves the problem of credit assignment. In this case we can both talk of structural and temporal credit assignment, meaning that one neuron has to decide which weight it should assign credit or blame to, and when. The temporal credit assignment problem has been carefully studied by Sutton [Sut84]. We implement the reinforcement signal identically with the signal described in section 2. This implementa- tion has a great potential as described in the introduction. The associative reward-inaction algorithm of Barto and Anandan [BA851 has the weight update rule of Awi = (~(y - p ) q if r=l Where y is the actual output, p is the probability for 1 as output and (Y is a positive learning constant. This is a rule which implements very easily in our model since y and p are already available, and we update wi with I1 - 270 a(y - p ) when there is a pulse on both xi and r (see figure 2a). Unfortunately initial simulation of this update rule gives negative result, probably since the dynamics have not been taken into account. Our work continues along this line. 5 Discussion 5.1 Hardware Implementation Our choice of asynchronous pulse trains turned up in the need of simple components for synaptic multiplication, summation and nonlinearity. Clearly, an asynchronous implementation is a great help in the design of large systems, since there are problems in keeping such systems clocked. The synaptic multiplication and the f-unit are easy to implement. In some sense we have moved the problem from the multiplication, summation and nonlinearity to the generation of signals and estimation of probabilities. This is justified by the fact that multiplication is the most common operation in a neural network and therefore needs to be simple. 5.2 Biological Issues The pulsed signal used by us seems to be very close to the biological spike trains and has been used as a statistical model of neural activity [Tuc89]. We must though bear in mind that there are many types of pulsed signals, with different statistics, in different animals [Tuc89]. The need of reconstruction as described in section 3 is interesting since we also have reconstruction in the biological neuron in the integrate-and-fire function of the soma. The reconstruction also makes our model dynamic, due to time delay in estimation. The lack of dynamics is a major draw-back for biological plausibility of e.g. âvanillaâ back propagation. The reinforcement type of learning is clearly more biologically plausible than e.g. back propagation. One can say that the feedback is evaluative in reinforcement learning and instructive in back propagation. The human, for example, does not r e d l y get instructions, he or she interprets sensory information which in turn gives an evaluative feedback. 5.3 Future work Our work will continue primarily with the learning aspect. The design of modular neural networks is very interesting since this can give a solution to one of the most important problems of ANN design today; that of dimensionality. When we have small networks, learning is easy, but when we increase the size, things break down. We will work more with issues of dimensionality, since the proposed model seems to promise a good path in this aspect. References [AAHS87] Joshua Alspector, Robert B. Allen, Victor Hu, and Srinagesh Satyanarayana. Stochastic learning networks and their electronic implementation. In Dana Z . Andersson, editor, IEEE Neural Infor- mation Processing Systems Conference, pages 9-21. American Institute of Physics, November 1987. David H. Ackley, Geoffrey E. Hinton, and Terrence J . Sejnowski. A learning algorithm for Boltzmann mashines. Cognitive Science, 9:147-169, 1983. Andrew G. Barto and P. Anandan. Pattern-recognizing stochastic learning automata. IEEE Trans- actions on Systems, Man, and Cybernetics, SMC-15(3):360-375, May/June 1985. [AIIS83] [BA853 11 - 271 [BRWSSO] William Bialek, Fred Rieke, David Warland, and R. R. De Ruyter Van Steveninck. Reading a neural [BSWSO] [Gai87] [LA871 [LL90] [LM89] code. In David S. Touretzky, editor, Advances in Neural Information Processing Systems 2, 1990. Andrew G. Barto, Richard S. Sutton, and Chris Watkins. Sequential decision problems and neural networks. In David S. Touretzky, editor, Advances in Neural Information Processing Systems 2, 1990. Brian R. Gaines. Uncertainty as a foundation of computational power in neural networks. In IEEE International Conference on Neural Networks, pages 111-51-57, 1987. Bernard C. Levy and Milton B. Adams. Global optimization with stochastic neural networks. In IEEE International Conference on Neural Networks, pages 111-681-689, 1987. Yuchun Lee and Richard P. Lippmann. Practical characteristics of neural networks and conventional pattern classifiers on artifical and speech problems. In David S. Touretzky, editor, Advances in Neural Information Processing Systems 2, 1990. R. A. Leaver and P. Mars. Stochastic computing and reinforcement neural networks. In First IEE International Conference on Artificial Neural Networks, pages 163-169, 1989. [MHRT89] A. F. Murray, A. Hamilton, H. M. Reekie, and L. Tarassenko. Pulse-stream arithmetic in pro- [Mur89] [NH87] [RGK87] [Rib 671 [SejSS] [Sny751 [Sut84] [Tuc89] [ VdB M88] [Wi187] grammable neural networks. In IEEE International Symposium on Circuits and Systems, pages 1210-1212, May 1989. Alan F . Murray. Silicon implementation of neural networks. In First IEE International Conference on Artificial Neural Networks, pages 27-32, 1989. Dziem Nguyen and Fred Holt. Stochastic processing in a neural network application. In IEEE International Conference on Neural Networks, pages 111-281-291, June 1987. R. Rastogi, P. K. Gupta, and R. Kumaresan. Array signal processing with interconnected neuron- like elements. In IEEE International Conference on Acoustics, Speech, and Signal Processing, pages 2328-2331, 1987. Sergio T . Ribero. Random-pulse machines. IEEE Transactions on Electronic Computers, 16(3):261- 276, June 1967. T. J . Sejanowski. Open questions about computation in cerebral cortex. In James L. McClel- land, David E. Rumelhart, and The PDP Research Group, editors, Parallel Distributed Processing, chapter 21, pages 372-389. The MIT Press, 1986. Donald L. Snyder. Random Point Processes. John Wiley & Sons, 1975. Richard S. Sutton. Temporal Credit Assignment in Reinforcement Learning. PhD thesis, University of Massachusetts, 1984. Henry C. Tuckwell. Stochastic Processes in the Neurosciences. Society for Industrial and Applied Mathematics, 1989. David E. Van den Bout and T. K. Miller. A stochastic architecture for neural nets. In IEEE International Conference on Neural Networks, pages 1 4 8 1 4 8 8 , June 1988. Ronald J . Williams. A class of gradient-estimating algorithms for reinforcement learning in neural networks. In IEEE International Conference on Neural Networks, pages 11-601-608, 1987. I1 - 272

[IEEE 1990 IJCNN International Joint Conference on Neural Networks - San Diego, CA, USA (1990.06.17-1990.06.21)] 1990 IJCNN International Joint Conference on Neural Networks - Pulse stream neural networks and reinforcement learning

Description

Comments