The bionic neural network model and its applications Preprint, Inst. Appl

Rice. 2.2. Structure of an artificial neuron

A neuron consists of three types of elements: multipliers (synapses), adder And nonlinear converter . Synapses communicate between neurons and multiply the input signal by a number characterizing the strength of the connection (the weight of the synapse). The adder performs the addition of signals arriving via synaptic connections from other neurons and external input signals. A nonlinear converter implements a nonlinear function of one argument—the output of the adder. This function is called activation function or transfer function neuron. The neuron as a whole implements a scalar function of a vector argument.

Mathematical model of a neuron:

, (2.1)

Where s- summation result (sum); w i- weight of the synapse, ; - component of the input vector (input signal), ; b— bias value; n- number of neuron inputs; at- neuron output signal; f— nonlinear transformation (activation function).

In general, the input signal, weighting coefficients and offset can take real values, but in many practical problems - only some fixed values. Exit y is determined by the type of activation function and can be either real or integer.

Synaptic connections with positive weights are called exciting , with negative weights - inhibiting . The described computational element can be considered a simplified mathematical model of biological neurons. To emphasize the difference between biological and artificial neurons, the latter are sometimes called neuron-like elements or formal neurons .

To input signal s nonlinear converter responds with an output signal f(s), which represents the output y neuron. Examples of activation functions are presented in table. 2.1, and the graphs of the most common activation functions are in Fig. 2.2.

Table 2.1

Neuron activation functions

Name

Range of values

Linear

Semilinear

Logistic (sigmoidal)

Hyperbolic tangent (sigmoidal)

Exponential

Sinusoidal

Sigmoidal (rational)

Stepping (linear with saturation)

Threshold

Modular

logistics function or sigmoid (function S-shaped type) (Fig.


2.3):

. (2.3)

When decreasing a The sigmoid becomes flatter, in the limit at a= 0 degenerating into a horizontal line at the level of 0.5, with increasing A The sigmoid approaches the form of a single jump function with a threshold T. From the expression for the sigmoid it is obvious that the output value of the neuron lies in the range (0, 1). One of the valuable properties of the sigmoid function is a simple expression for its derivative, the use of which will be discussed later:

. (2.4)

Rice. 2.3. Graphs of activation functions: a – function of a single jump; b – linear threshold (hysteresis); c – sigmoid (logistic function), formula (3); g – sigmoid (hyperbolic tangent)

It should be noted that the sigmoid function is differentiable along the entire x-axis, which is used in some learning algorithms. In addition, it has the property of amplifying weak signals better than large ones and preventing saturation from large signals, since they correspond to regions of the arguments where the sigmoid has a gentle slope.

An artificial neuron imitates, to a first approximation, the properties of a biological neuron. The input of an artificial neuron receives a number of signals, each of which is the output of another neuron. Each input is multiplied by a corresponding weight, analogous to synaptic strength, and all products are summed to determine the neuron's activation level. In Fig. 1.2 presents a model that implements this idea. Although networking paradigms are very diverse, almost all of them are based on this configuration. There are many input signals here, labeled x 1 ,x 2 , …,x n, goes to the artificial neuron. These input signals, collectively denoted by the vector X, correspond to signals arriving at the synapses of a biological neuron. Each signal is multiplied by the corresponding weight w 1 , w 2 , , w n , and goes to the summing block, designated Σ. Each weight corresponds to the “strength” of one biological synaptic connection. (The set of weights is collectively denoted by the vector W.) The summing block corresponding to the body of the biological element adds the weighted inputs algebraically, creating an output that we will call NET. In vector notation, this can be compactly written as follows:

NET = XW.

Rice. 1.2. Artificial Neuron

        1. Activation functions

OUT = K(NET),

Where TO - constant, threshold function

OUT = 1 if NET > T, OUT = 0 otherwise,

Where T - some constant threshold value, or a function that more accurately models the nonlinear transfer characteristic of a biological neuron and represents great possibilities for the neural network.

Rice. 1.3. Artificial neuron with activation function

In Fig. 1.3 block designated F, receives the NET signal and outputs the OUT signal. If the block F narrows the range of changes in the value of NET so that for any values ​​of NET the values ​​of OUT belong to a certain finite interval, then F called "compressive" function. The logistic or “sigmoid” (S-shaped) function shown in Fig. 1 is often used as a “squeezing” function. 1.4a. This function is expressed mathematically as F(x)= 1/(1 + e - x). Thus,

.

By analogy with electronic systems, the activation function can be considered a nonlinear amplification characteristic of an artificial neuron. The gain is calculated as the ratio of the increment in OUT to the small increment in NET that caused it. It is expressed by the slope of the curve at a certain excitation level and varies from small values ​​at large negative excitations (the curve is almost horizontal) to a maximum value at zero excitation and decreases again when the excitation becomes big positive. Grossberg (1973) found that such a nonlinear response solved his noise saturation dilemma. How can the same network handle both weak and strong signals? Weak signals need high line amplification to produce a usable output signal. However, high-gain amplifier stages can cause the output to become saturated with amplifier noise (random fluctuations) that is present in any physically implemented network. Strong input signals will in turn also saturate the amplifier stages, eliminating the useful use of the output. The central region of the logistic function, which has a high gain, solves the problem of handling weak signals, while the regions with decreasing gain at the positive and negative ends are suitable for large excitations. Thus, the neuron functions with high gain over a wide range of input signal levels.

.

Rice. 1.4a. Sigmoid logistic function

Another widely used activation function is the hyperbolic tangent. It is similar in form to the logistic function and is often used by biologists as a mathematical model of nerve cell activation. As an activation function of an artificial neural network, it is written as follows:

Rice. 1.4b. Hyperbolic tangent function

Like the logistic function, the hyperbolic tangent is an S-shaped function, but it is symmetrical about the origin, and at the point NET = 0 the value of the output signal OUT is zero (see Fig. 1.4b). Unlike the logistic function, the hyperbolic tangent takes values ​​of different signs, which turns out to be beneficial for a number of networks (see Chapter 3).

Reviewed simple model An artificial neuron ignores many of the properties of its biological counterpart. For example, it does not take into account time delays that affect the dynamics of the system. Input signals immediately generate an output signal. And, more importantly, it does not take into account the effects of the frequency modulation function or the synchronizing function of the biological neuron, which a number of researchers consider crucial.

Despite these limitations, the networks built from these neurons exhibit properties that closely resemble a biological system. Only time and research will be able to answer the question of whether such coincidences are accidental or a consequence of the fact that the model correctly captures the most important features of a biological neuron.

(The bionic neural network model and its applications
Preprint, Inst. Appl. Math., the Russian Academy of Science)

Yolkin S.S., Yolkin S.V., Klyshinsky E.S., Maksimov V.Yu., Musaeva T.N.
(S.S.Yolkin, S.V.Yolkin, E.S.Klyshinsky, V.Yu.Maximov, T.N.Musaeva)

IPM im. M.V.Keldysh RAS

Moscow, 2008
The work was carried out with financial support from the Russian Foundation for Basic Research (project No. 08-01-00626)

annotation

The work presents a neural network model based on an analysis of the behavior of pyramidal neurons in the brain. The model differs from the classical approach to constructing neural networks in the nonlinear nature of its behavior, the introduction of several types of neuron inputs, and the heterogeneity of its construction. The paper provides some examples of constructing bionic networks to solve various problems.

Abstract

The paper presents the model of neural network based on the analysis of behavior of brain"s pyramidal neurons. The model differs from classical approaches to a neural network representation by nonlinearity, a bunch of neuron types and heterogeneity. The papers contains some examples of bionic networks for different tasks solving.



1. INTRODUCTION

It is now becoming obvious that further development of various aspects of the problems automatic control complex systems is not possible without combining the efforts of technical and biological sciences. Common tasks for these sciences are: management of dynamic objects and databases, as well as management optimization. Neural networks are used to control complex systems and objects.

The modern approach to the creation of neural systems is evolving in the direction of moving their functioning away from real biological analogues. To model modern neural systems, a highly simplified model of the basic element (neuron) is used. To move to a more approximate biological analogue of the neuron model, it is necessary to change the approach and turn to studies of the basic elements (neurons) of the human cerebral cortex, and this area is this moment not fully researched. Any research here is very important for use in modern technologies.

In the review article by A. V. Savelyev “On the way to general theory neural networks. On the issue of complexity" (RFBR grant 04-06-80460. Journal Neurocomputers: development, application No. 4-5, 2006, p. 4) presents the results of a study of neural networks that are fundamentally inadequate to their biological analogues. Based on the open hypercomplexity of biological neurons, exceeding the complexity of networks and not reducible to it, the author of the article shows the possibility of generalizing the theory of neural networks, which may result in a completely different principle of organizing the architecture of neurocomputers. However, we believe that when constructing practical models there is no need to take into account the entire systemic hypercomplexity of the neuron. Genetics, nutrition, respiration , the biochemistry of the neuron are of secondary importance for neuroinformatics.It is necessary to obtain fundamentally new models of biological neurons and neural networks based on them by using the latest achievements of neurobiology and neurophysiology.

Traditional neuroinformatics approaches the creation of neural networks based on the principle that the more complex the task, the larger neural networks or their conglomerates should be used. However, this is not always true. The use of a formal neuron as a basic element of modern artificial neural networks leads to the fact that due to complete abstraction from a complex biological neuron, technical limitations arise on the capabilities of such neural networks. But the use of a bionic neuron, made as close as possible to a biological one, is also not possible due to its multi-structural complexity. Thus, there is a need to search for a universal neurobionic paradigm aimed at analyzing the mechanisms, patterns and principles intended for the creation and operation of a new basic basis of neurocomputers, combining the simplicity of designing classical neural networks and the basic complexity and versatility of bionic neurons. We believe that to solve this problem it is necessary to develop a set of mathematical models of bionic neurons that simultaneously meet the needs of neurophysiology and neuroinformatics. At the same time, our efforts will be concentrated in the following areas:

· analysis of the dynamics of spatio-temporal distribution and interaction of working and non-working functional blocks;

· the influence of the ratio of power and time parameters, information and control, arriving at the neuron;

· the influence of a neuron on the implementation of different functions and the interaction of functionally different neurons due to the differentiated distribution of information and control signals arriving along different branches of the output axon of one neuron;

· specificity of the main types of neurons in the associative cortex cerebral hemispheres brain.

To develop such a model, it is assumed that methods will be used system analysis when summarizing experimental data and classifying biological neurons. Since the search problem optimal solution in some cases is incomputable due to the empirical nature of the data or methods of working with them, the exponential complexity of solving the problem by brute force, etc., then the need to use bionic construction methods technical systems becomes more obvious.

To achieve these goals, it is necessary to solve the following particular problems.

1. Generalization of experimental data on biological neurons, including humans.

2. Classification of biological neurons by functions and properties.

3. Determination of the basic mechanisms of operation of a biological neuron:

Threshold principles for limiting neuron potential,

Principles of inhibition and excitation and their time dependencies,

Mechanisms for generating pulse sequences and their limitations in terms of potential and threshold,

The influence of neuron inputs on its potential,

Memory mechanisms and dynamics of changes in the learning coefficient during the operation of a neuron as part of a network (training, retraining and untraining of a neuron),

The mechanism of delay in the arrival of impulses between neurons and its dependence on the weight and fatigue of the synapse,

Basic states of a bionic neuron for each class of neurons and mechanisms of transitions between them;

Determine the main types of inputs and outputs of a neuron, their fundamental functional differences, as well as their significance for the operation of the neuron as a whole and the impact on the principles of constructing artificial neural networks on such bionic neurons;

Highlight the main properties and functions of the elements of biological neurons: dendrites, axons, synapses, soma.

4. Math modeling properties and functions of biological neurons.

5. Mathematical modeling of each individual class of neurons.

6. Development of tools for working with bionic neurons.

7. Construction of neural networks on these models and testing their performance for various classes of problems.

At the moment, a preliminary generalization of biological experimental data on neurons of the human brain has been carried out. A theoretical study of possible mathematical models of the selected classes of neurons was carried out.

The research carried out in this project is based on the use of models of the functioning of the basic elements of the human cerebral cortex - bionic neurons. For this purpose, neural networks of all logical elements, a counter, a neuron state analyzer, a dominant filter and a robot orientation system in the maze have been created. This will further create a basis for the introduction of neurotechnologies in robotics.

1.1. Biological neuron

The main structural element of the nervous system is the nerve cell, or neuron. Through neurons, information is transmitted from one part of the nervous system to another, information is exchanged between nervous system and various parts of the body. Occur in neurons very complex processes information processing. With their help, the body's responses (reflexes) to external and internal stimuli are formed.

A neuron can have different sizes and shapes, but schematically it is always easy to imagine it as a cell with processes (Figure 1). It consists of a cell body ( soma), containing the nucleus, and processes, which are divided into dendrites, through which nerve impulses arrive to neurons, and axon, along which a nerve impulse travels from a neuron to other cells.

Dendrites are processes of a neuron that conduct impulses to the body of the neuron. They are usually short, relatively wide, highly branched, and form many synapses with other nerve cells.

Each axon ends on the body or dendrites of other neurons at a contact called a synapse. Synapse - this is a specialized structure that ensures the transfer of excitation from one excitable structure to another. The term “synapse” was introduced by C. Sherrington and means “convergence”, “connection”, “clasp”.

Figure 1 - Neuron structure

All irritations entering the nervous system are transmitted to the neuron through certain sections of its membrane located in the area of ​​synaptic contacts. In most nerve cells, this transmission is carried out chemically with the help of mediators. The response of neurons to external stimulation is a change in the value of the membrane potential.

The more synapses on a nerve cell, the more various stimuli are perceived and, therefore, the wider the sphere of influence on its activity and the possibility of the nerve cell participating in various reactions of the body.

The effects that occur when a synapse is activated can be excitatory or inhibitory.

When several excitatory synapses are simultaneously activated, the total excitatory impulse of the neuron is the sum of the individual local excitatory impulses of each synapse. With the simultaneous occurrence of two different synaptic influences - excitatory and inhibitory - a mutual subtraction of their effects occurs. Ultimately, the reaction of a nerve cell is determined by the sum of all synaptic influences.

With the appearance of the action potential (AP), which, unlike local changes in membrane potential, is a spreading process, the nerve impulse begins to be conducted from the body of the nerve cell along the axon to another nerve cell or working organ, that is, it is carried out neuron effector function.

The magnitude of the membrane potential is the main parameter that determines the values ​​of the most important indicators of the functional state of a neuron - its excitability.

The excitability of a neuron is its ability to respond to synaptic input with an action potential. It depends on the ratio of two parameters - the membrane potential and the critical level of depolarization (threshold). Under normal operating conditions, the critical level of neuron depolarization is relatively constant, so the excitability of the neuron is determined mainly by the magnitude of the membrane potential.

The amount of depolarization of nerve cells is linearly dependent on the frequency of irritating impulses . Higher departments the brain, sending impulses of various frequencies to the neurons of the underlying sections, regulate their excitability, exercising control over the body's responses.

1.2. Formal neuron

Neuron is integral part neural network. Figure 2 shows its structure.

Figure 2 – Structure of a formal neuron

It consists of three types of elements: multipliers (synapses) - w , adder –Σ and nonlinear converter - f . Synapses communicate between neurons and multiply the input signal by a number characterizing the strength of the connection (the weight of the synapse). The adder performs the addition of signals arriving via synaptic connections from other neurons and external input signals. A nonlinear converter implements a nonlinear function of one argument - the output of the adder. This function is called the activation function or transfer function of the neuron. The neuron as a whole implements a scalar function of a vector argument. Mathematical model of a neuron:

, (1)

Where

Wi - weight (weight) of the synapse, i = 1... n;

b - offset value ( bias);

s - summation result ( sum);

Xi - component of the input vector (input signal), i = 1... n ;

y is the output signal of the neuron;

n - number of neuron inputs;

f - nonlinear transformation (activation function).

In general, the input signal, weighting coefficients and offset can take real values, but in many practical problems - only some fixed values. The output is determined by the type of activation function and can be either real or integer.

Synaptic connections with positive weights are called excitatory, and those with negative weights are called inhibitory.

The described computational element can be considered a simplified mathematical model of biological neurons. To emphasize the difference between biological and artificial neurons, the latter are sometimes called neuron-like elements or formal neurons.

The nonlinear transducer responds to the input signal with an output signal, which is the output of the neuron. One of the most common activation functions of a nonlinear converter is the nonlinear activation function with saturation, the so-called logistic function or sigmoid function (function S -shaped type):

(2)

Where

s – function argument

a – coefficient

Figure 3 - Sigmoid (hyperbolic tangent)

From expression (2) it is obvious that the output value of the neuron lies in the range (0, 1).

The sigmoid function is differentiable along the entire x-axis, which is used in some learning algorithms. In addition, it has the property of amplifying weak signals better than large ones and preventing saturation from large signals since they correspond to argument regions where the sigmoid has a shallow slope.

A neural network is a collection of neuron-like elements connected in a certain way to each other and to the external environment using connections determined by weighting coefficients. Depending on the functions performed by neurons in the network, three types can be distinguished:

Input neurons to which a vector encoding an input effect or an image of the external environment is supplied; they usually do not carry out computational procedures, and information is transferred from input to output by changing their activation;

Output neurons, whose output values ​​represent the outputs of the neural network; transformations in them are carried out according to expressions (1);

Intermediate neurons form the basis of neural networks, transformations in which are also performed according to expressions (1).

In most neural models, the type of neuron is related to its location in the network. If a neuron has only output connections, then it is an input neuron; if, on the contrary, it is an output neuron. However, it is possible that the output of a topologically internal neuron is considered as part of the network output. During the operation of the network, the input vector is converted into an output vector and some processing of information is carried out. The specific type of data transformation performed by the network is determined not only by the characteristics of neuron-like elements, but also by the features of its architecture, namely the topology of interneuron connections, the choice of certain subsets of neuron-like elements for input and output of information, methods of training the network, the presence or absence of competition between neurons, direction and methods control and synchronization of information transfer between neurons.

The choice of neural network structure is carried out in accordance with the characteristics and complexity of the task. Optimal configurations already exist for solving certain types of problems. If the problem cannot be reduced to any of the known types, one has to solve the complex problem of synthesizing a new configuration.

1.3. Modeling of non-trivial (intelligent) adaptive behavior

Focus. One of the distinctive features of animal behavior is purposefulness, the desire to achieve a certain goal. The goals of animal behavior are related to the need to satisfy needs. The body's basic need is survival need. As leading needs (subordinate to the basic need) we can distinguish energy requirement(nutrition needs), security need And need for reproduction, and also – as noted in the works of A.A. Zhdanova – need for knowledge accumulation.

The desire to satisfy needs can be characterized motivations, and characterize quantitatively. For example, if an animal has a nutritional need, then motivation to satisfy this need can be introduced in this way: the greater the animal’s feeling of hunger, the greater this motivation; when an animal finds food, eats it and satisfies its nutritional need, this motivation decreases; when the animal is satiated, this motivation goes to zero (we assume that the motivation is non-negative).

Attempts to model motivations and their role in adaptive behavior in several different aspects have been carried out by a number of authors. L.E. Tsitolovsky investigated a simple stochastic optimization scheme (minimizing motivations), leading to the satisfaction of needs, and also analyzed the role of motivations in the functioning of an individual neuron. M.S. Burtsev et al. proposed and analyzed a model of the evolutionary emergence of goal-directed adaptive behavior with special emphasis on the role of motivations in adaptive behavior. H. Balkenius made a brief analytical review of schemes and models of cognitive systems that take into account the motivational component.

Holistic Adaptive Behavior . When modeling animal behavior, it is natural to consider holistic adaptive behavior, which takes into account the general hierarchical structure of needs and goals: the individual needs and goals of the organism are subordinated to the basic need - the need for survival. Holistic adaptive behavior is analyzed in the theory of functional systems by P.K. Anokhina. A scheme for modeling holistic adaptive behavior was proposed in the “Animal” project by M.M. Bongarda et al. . In the Animal project, modeling of holistic adaptive behavior is considered as a task close to modeling thinking. The paper proposes a thinking modeling scheme that includes elements of adaptive behavior (in the context of holistic adaptive behavior).

Internal model. Another concept that is natural to use when modeling intelligent adaptive behavior is the “internal model.” Indeed, if an animal can build its internal model of the external environment and its interaction with the external environment, then on the basis of such a model it can predict future events in external environment and the results of their actions and adequately use these forecasts in their adaptive behavior. Moreover, when making predictions, the animal can make certain “logical conclusions” based on its model.

A person, naturally, also has his own models of situations and models that characterize his general ideas about the external world. Moreover, the general scientific picture of the world - created by the entire international scientific community - can also be considered as a set of models. Those. starting from the concept of “internal model”, we can try to move from studies of the “intelligence” of animals to the analysis of the most interesting forms of thinking - the thinking of a scientist, thinking used in the scientific knowledge of Nature. Note that the concept of an internal model has been emphasized by a number of authors. V.F. Turchin considers modeling, the formation of animal and human models environment, on the basis of which foresight occurs, as an important component of the cognitive process. E. Jantsch notes that the emergence of the ability to build models of the external world was one of the stages of self-organization of the biosphere. F. Heilighen and K. Joslin specifically introduce the concept endo model– internal model, i.e. model, which is formed by the analyzed object (animal, human, robot, or any other cybernetic system), and distinguish this concept from the concept exo-model– a model of the object itself, which is built by a researcher who analyzes the behavior of the object in question. R. Sutton and E. Berto proposed and analyzed a simple neural network model of the "internal model". To make the consideration more specific, we will briefly outline the main ideas of this work. The following experiment with rats is simulated. There is a T-shaped maze, with two chambers attached to the arms of the maze (Fig. 1). The red chamber is attached to the right arm of the maze, the green one to the left arm. The experiment consists of three stages. On first research stage, the animal is placed at the entrance of the maze and allowed to move through the maze and explore it, without any reinforcement or punishment. On the second associative stage, both chambers are separated from the maze, transferred to another room and there they are rewarded or punished: in the red chamber the animal receives food, and in the green chamber an electric shock. At the third stage testing the cameras are returned back and attached to the maze, and the animal is placed at the entrance of the maze and observed where it goes. The experiment demonstrates that when tested, animals predominantly move to the right. According to this experiment, the animal must build a model of the external environment and combine two independent factors: 1) turning right/left leads to the red/green chamber, 2) in the red/green chamber you can get a reward/ punishment. That is, the animal makes a “logical conclusion” something like this:

This conclusion is similar to one of the basic formulas of logical inference:

(If from A should IN, and from IN should WITH, then from A should WITH).

A neural network model was built to explain the behavior of animals in the described experiment. The diagram of this neural network is shown in Fig. 4

This neural network contains 5 neurons and consists of two modules. The predictive module includes neurons 1-3, the action selection module includes neurons 4.5. From the external environment, the input of the neural network receives signals about the color of the camera and reinforcement (“Green”, “Red”, “Reinforcement”). At the network output, action commands are generated (“Right” or “Left”). The predictive module builds a simplified model of the external environment and predicts the state of inputs from the external environment at upcoming times. The action selection module generates action commands.

The scheme included special model neurons developed based on the concept of A.G. Klopf "The Purposeful Neuron". These neurons are similar to regular neurons with modifiable Hebbian synapses, but additionally possess some form of short-term memory. For more information about this neuron model, see.

Rice. 4 Diagram of a neural network that carries out the forecast. Neurons are shown as squares with numbers 1-5 inside, synapses as circles, neuron outputs as bold arrows, directions of signal transmission between neurons as thin arrows, environmental influences as dashed arrows.

When simulating the experiment at the research stage, an associative memory was formed in the predictive module (by modifying the communication synapses between neurons 4, 5 and 1,2), in which it was remembered that when moving left/right the animal ends up in the green/red chamber. At the associative stage in the predictive module, recurrent connections were formed between neurons 1 and 2 and neuron 3 (predicting that negative/positive reinforcement could be obtained in the green/red chamber), and synapses at the inputs of neurons 4, 5 were also modified, which ensured preferential choice of movement to the right. At the testing stage it was confirmed that this model indeed qualitatively corresponds to the experiment described above.

Of course, Sutton's model is only an example of an approach to modeling the internal models on the basis of which animals make predictions about future events in the external environment and adequately use these predictions. However, it is intuitively felt that " internal models"can characterize very non-trivial knowledge of an animal about the external world and provide the cognitive abilities of animals. And, as noted above, in the way of analyzing such models, we could try to find connections between the cognitive abilities of animals and human knowledge of the external world, including with scientific knowledge of Nature.

1.4. Bionic neuron

The bionic neuron model implemented in the project was developed by Dr. Professor V.B. Valtsev.

Neuron is an element of a neural network. Each neuron has inputs and outputs. There are several types of inputs: excitation, regulation, memory, inhibition, inhibition. The current state of the neuron is determined by the current potential and the current threshold. A neuron is capable of receiving and emitting impulses.

The current state of a neuron changes over time. If there is no supply of pulses to the inputs, then the value of the current potential tends to zero according to the exponential law:

, (3)

Where

P(t ) – current potential value

P(t -1) – potential value at the moment of time t -1

α – potential attenuation coefficient

ΔtP(t-1)

In this case, the value of the current threshold tends to some constant value over time, called the resting threshold. The rest threshold is a value greater than zero:

,(4)

Where

T(t ) – current threshold value;

T(t -1) - threshold value at time t -1

T 0 - rest threshold

α – threshold attenuation coefficient

Δt – time elapsed since the potential was equal T(t-1)

The neuron potential is limited by the values Pmin and Pmax below and above, respectively ( Pmin ≤ 0; Pmax > 0). The threshold is limited from above by the value Tmax , from below – by size Tmin (moreover, 0 ≥ Tmin > > maxT ). Potential and threshold limitations are taken into account when calculating the effect of the received impulse.

A neuron can receive signals (impulses) using inputs. Each neuron input is characterized by a weight coefficient W (input weight). Impulses arriving at the input of a neuron change its current state. The effect of an impulse is determined by the type of input it received, the weight of this input, and the current state of the neuron. Figure 5 shows a model of a bionic neuron.

Figure 5 - Model of a bionic neuron

1 - excitation input

2 - regulation input

3 - memory input

4 - prohibition input

5 - braking input

6 - output (synapse) of the neuron

A single impulse arriving at the input of an excitation-type neuron increases the value of the neuron’s potential by a certain value:

P = P ’ + H , (5)

Where

P – current potential value

P ’ – previous potential value;

H – the magnitude of the potential change depends on the potential and changes according to the law:

H = W f (P ’), (6)

Where

f (P) is a function that lies in the range from zero to one, and:

f (P ≤0)=1,

f (P) tends to zero as P tending to P max . On the segment where P is greater than zero, the function f(P ) can be defined as:

f (P) = ,(7)

Thus: if the neuron is not excited (the potential is zero), then the impulse increases the potential value by an amount equal to the input weight. Long periodic impulses raise the potential in “steps”, the height of which decreases as the potential itself increases. The height of the steps becomes zero when the potential reaches the maximum maximum value (P max ). For a given function specification, the potential may exceed the maximum value by a negligibly small amount.

If pulses are sequentially applied to the excitation input of a neuron so that either their frequency or the weight of the input compensates potential attenuation coefficient, then the potential will increase stepwise, as shown in Figure 6. The vertical dotted lines show the output impulse of the neuron.

Figure 6 - Excitation of a neuron

The Braking input behaves in a similar way. However, its role is to reduce the potential by the amount H , which is calculated similarly:

P = P ’ - H ,(8)

Where

P ’ – previous potential value,

H – a quantity that changes according to the law:

H = W f (- P ’),(9)

Where

W – weight of the synapse through which the impulse came

In this case, the argument is taken with a minus sign, while P acts as a limiter min:

f (P)= , (10)

If impulses are sequentially applied to the inhibition input of a neuron so that either their frequency or the weight of the input compensates potential attenuation coefficient (with opposite sign), then the potential will decrease stepwise, as shown in Figure 7. In Figure 7, the output impulse is not shown, since it does not exist, due to the fact that the neuron potential is below the threshold.

Figure 7 - Neuron inhibition

Changing the threshold values ​​is carried out by impulses to the regulation and prohibition inputs. The pulse received at the regulation input reduces the threshold value by the amount H , which is calculated similarly using formulas (8-10).

Accordingly, the ban increases the threshold value by the amount H and is calculated similarly using formulas (5-7).

The memory input works in a special way. Similar to excitation, it increases the potential, but the increase in potential now depends not only on the weight of the input, but also on the current state of the learning coefficient. The learning coefficient, unlike the weight, changes its value dynamically during the operation of the neural network. It can take values ​​in the range from 0 to 1. The potential increment is calculated using the formula:

H = µ· W · f (|P ’|),(11)

Where

µ - training coefficient

W – weight of the synapse through which the impulse came

f(P ) – function calculated by formula (7)

If µ = 0, then the input is considered untrained - in this case, impulses to this input do not have any effect on the state of the neuron. The max trained input (µ = 1) works similarly to the weighted excitation input W , until the value of µ changes (decreases) again.

Training, retraining, untraining are mechanisms that regulate the value of µ and, as a consequence, the operation of the neuron’s memory inputs.

Input learning is an increase in µ by some constant value ∆µ + (obviously less than unity), the so-called learning ability. This value is unchanged and fixed at ∆µ + = 0.2. Learning occurs when the following conditions are met:

1) a signal has been received at this input

2) the signal to the memory input was supported by a signal from the excitation input (the excitation signal must arrive no later than after a time ∆ T)

3) the threshold value at this moment was less than – rest threshold (which is possible only in the presence of regulatory impulses).

Retraining - a decrease in µ, occurs in cases where the signal received at the memory input was not reinforced by a subsequent signal to the excitation input, or was not accompanied by regulatory impulses (in this case T≥ ). In this situation, the value of µ will decrease by ∆µ - .

Thus, the memory input differs from the excitation input in the ability to change the significance of its contribution to the total potential depending on the nature of the impulse.

In addition to the weight characteristic common to all types of inputs W and private characteristics of Memory type inputs (µ, ∆µ + , ∆µ -) each input is also characterized by the value Delay.

Delay determines the delay in the arrival of impulses from one neuron to another.

In the model, this is implemented as follows: each input remembers the impulse received on it, but the effect of it is calculated only after a while Delay . For most network functions and processes, the parameter Delay may not be needed, so the default value of this parameter is set to zero.

Pulse generation occurs if the value∆Ω is positive:

∆Ω = P – T,

That is, the potential has exceeded the threshold. The pulse generation frequency depends linearly on ∆Ω:

w = w’ + k∆Ω, (12)

Where

w – pulse frequency

w ’ – minimum frequency

k – proportionality coefficient

Limitations on the pulse frequency follow from the upper limit on the potential and the lower limit on the threshold ( Pmax, Tmin).

The mathematical model was compiled based on knowledge about the operation of a real biological neuron, with some accepted simplifications.

Since the network topology for the bionic model is not defined, despite the loss of time for developing the structure, it is possible to build a more optimal and flexible system, with the ability to add new modules that expand the capabilities of the network.

Also, due to the high level of formalization of the bionic neuron, it is possible to study the network’s functioning algorithm and guarantee the system’s response to any influences, which is not always possible in formal neural networks.

Significant difference between a bionic neuron and formal models, gives reason to believe that further research of this model and the construction of neural networks on its basis will lead to an expansion of the use of neurotechnology in various areas of human activity.

2. DEVELOPMENT OF SIMPLE BIONIC NEURAL NETWORKS

2.1.Logic elements

To fully design neural networks of any type, ready-made basic blocks are required. In particular, such blocks are simple logical functions, various counters, analyzers, and simple functions. This section is devoted to the design of just such basic blocks, which in the future will become the basis of complex networks. At the same time, we are testing the capabilities of the bionic neuron and networks based on it.

2.2. Logic element or


Figure 8 Logic elementor

Neurons 1 and 2 are input. Neurons 3 and 5 are needed to return the network to its original state after the passage of an impulse. Neuron 4 performs the OR function and is also an output neuron.

The OR logic element produces a signal at the output in cases where at least one signal is received at the inputs. If the generators excite neurons 1, or 2, or both at once, then neuron 4 is excited. It sends a signal to the output, after which neurons 3 and 5 return the network to its original state.

2.3. Logic element and

Logic element &. A neuron generating output impulses will fire only if incoming signals arrive at it simultaneously.


Figure 9 Logic elementand

The AND element is externally an exact copy of the OR element, but the synapses from neurons 1 and 2 to neuron 4 are configured so that neuron 4 will fire only when both signals arrive at the same time.

2.4.Logic element XOR

Logic element XOR generates a pulse only if it receives a pulse from only one of the inputs. Achieved by excluding the AND element from the OR element. Both of these elements are connected in parallel, but if the AND is triggered, the signal does not pass. If OR worked, but AND did not work, then the signal passes.


Figure 10 Logic elementxOr

The 1st and 2nd neurons perform the functions of input signal generators. The 3rd as well as the 5th are input neurons. The 5th neuron is an element I. The 7th interneuron is a blocking neuron. The 6th is an output neuron, if the signals arrived simultaneously from both generators.

There are three options for the development of events.

1. Neurons 1 and 2 do not generate signals. In this case, the network is not active and there will be no signal at the output neuron.

2. Both neurons 1 and 2 generate signals and feed them into the network. As a result, neurons 3 and 5 will be excited, which will excite the output neuron. Together, generators 1 and 2 will be able to excite neuron 4, which performs the AND function. And then neuron 7, under the influence of neuron 4, will inhibit output neuron 6. Thus, there will be no output signal. All other synapses are needed to return the neurons of the network to their original state after the signal has passed.

3. The signal comes from only one neuron, for example, from neuron 1. After entering the network, the signal will excite neuron 3. But the synapses are configured in such a way that neuron 3 alone cannot excite neuron 4, so blocking of output neuron 6 will not occur, and the neuron 3 will transmit its signal to the network output.

This construction is not the only correct one. The problem can be solved in many ways. For example, returning the network to its original state can be achieved due to an exponential decline in the excitability of neurons, and not due to the introduction of additional synapses

2.5. Logic element not


Figure 11 Logic elementnot

Neuron 1 is the input neuron. Neuron 2 is always excited and continuously outputs a signal. If neuron 1 is excited, neuron 2 is blocked and the output signal disappears.

2.6. Parallel to Serial Converter


Figure 12 Logic element that converts several simultaneous input signals into a time sequence

Neuron 1 is the output neuron. Neurons 2 and 3 bring the network to its original state after the signals pass through. Neuron 4 generates two simultaneous signals. Due to the different delays at the synapses, they are separated in time, after which they generate successive signals on the output neuron.

2.7.Decimal counter


Figure 13 Decimal counter with 6 neurons, two neurons for each digit.

The network consists of neuron 3 performing the function of a generator and blocks of two neurons connected in series. Each such block adds one order of magnitude to the maximum number for the counter. Neuron 1 receives impulses from neuron 3 and upon receipt of the 10th impulse is excited. Next, neuron 2 slows down Neuron 1 to the initial state, and one impulse is transmitted to neuron 4.

2.8 Network reading the state of a neuron


Figure 14 Network reading the state of a neuron.

The network is designed according to the meter principle. Neuron 1 discretely reduces the threshold of the neuron under study and simultaneously sends one impulse to the counter. Thus, by the time the threshold of neuron 2 becomes lower than its level of excitation, the counter will record how many times the threshold has decreased. After excitation, neuron 2 inhibits neuron 1 and the network stops.

2.9. Neuron state analyzer


Figure 15 Network that determines which of the given five ranges the neuron under study can be assigned to based on the level of excitability.

Essentially, the network consists of five AND elements (one of them is highlighted in red), each of which consists of 4 neurons. At one input of each element there is a neuron under study, at A very complex network that determines which zone it stops at. The counter will record how many times the threshold has decreased. rir the second is the reference one. Initially, the thresholds of the neuron under study and all reference neurons have a maximum value. Using a special generator neuron, we begin to slowly lower the thresholds of these neurons. At a certain moment, the neuron under study is excited. In this case, the impulse will pass through that element AND whose reference neuron is also excited. Thus, by configuring the output synapses from each AND element differently, we can find out which of the five ranges we have specified the neuron we are studying belongs to.

3. DEVELOPMENT OF A MODEL OF ADAPTIVE BEHAVIOR IN THE MAZE

The behavior of animals when performing various tasks in mazes is an excellent material for studying the functioning of the central nervous system during orientation and the processes of adaptation to changes in experimental conditions. Accordingly, modeling the behavior of intelligent systems when solving similar problems provides rich food for thought on the adequacy of our models to the real work of biological neural networks. In this connection, it seems quite natural to try to solve the problem of orientation in a maze when searching for “food” (need model), taking into account the analysis of the traversed sections of the maze. For this purpose, a labyrinth model, a demand model, and a bionic neural network were developed and implemented programmatically, and a study of the operation of the neural network was conducted to solve the problem.

3.1 Environment model (labyrinth)

The model object called "beetle" has the following functions:

1.The ability to move in four directions through a labyrinth. During one cycle of operation of the control neural network, the “beetle” moves one step - one link of the labyrinth.

2. The need for “food”. To do this, he moves in a direction that is not prohibited (dead end) and brings him closer to the food. If two directions are equivalent (same distance), then a random process of choosing one of the directions is played out.

3. The labyrinth is a rectangular lattice, some elements of which are removed randomly. Thus, movement along sections of the lattice is not possible everywhere, which is equivalent to a simple rectangular labyrinth with the number of links (sections) (Figure 16) equal to 2* N *(N -1)- k , where N – number of intersections along one of the axes, k - the number of randomly removed links.

Figure 16 Maze 15 by 15. Food is indicated in yellow.

3.4. Neural network model


Figure 17 “Bug” neural network


This network was reduced as much as possible and optimized to solve the model problem. The network consists of four inhibitory neurons. Each of the neurons is responsible for choosing the direction of movement within the maze: Up, left, down, or right. Each neuron has an excitation input, where impulses are supplied in an amount corresponding to the amount of need (proximity of food) from a given side. Also, each neuron has two inhibitory inputs. One entrance is strong for prohibition if there is no passage on that side. The second input is weak, where pulses are sent only if the trace of the “beetle” is visible from this side, that is, if the “beetle” was already there. The number of pulses supplied to the braking input associated with the “bug” trace is inversely proportional to the age of this trace. Each neuron has a synapse that is closed to itself; this is necessary for the self-excitation of the neuron when its level of excitation is read. Each neuron inhibits all the others, thus the most highly excited one will remain the only active one as a result of the network’s operation and will determine the direction of movement of the “beetle”.

CONCLUSION

As part of the first year of the project, models were developed: simple logical elements, a decimal counter, a neuron state analyzer, and an orientation control system in the external environment using bionic neurons.

During the modeling process it was established:

a) this bionic neuron model is suitable for constructing heterogeneous neural networks.

b) the capabilities of this neuron model are not limited to the created neural networks.

c) this project is the basis for the transition to the next, more complex model of a bionic neuron.

The developed implementation of the orientation process is planned to be used for further research and equipment this process new functions, which may allow for the development of this area.

BIBLIOGRAPHY

1. Nicholls J., Martin R., Wallas B., Fuchs P. From neuron to brain - M.: Publishing House Editorial URSS, 2003.

2. Kruglov V.V., Borisov V.V. Artificial neural networks. Theory and practice - M.: Hotline-Telecom Publishing House, 2002.

3.Berkinblit M.B. Neural networks - M.: MIROS Publishing House, 1993.

4. McCulloch W., Pitts V. Logical calculus of ideas related to nervous activity // Neurocomputer. –1992. - No. 3/4. – R.40-50

5. Zhdanov A. A. Method of autonomous adaptive control // Proceedings of the Academy of Sciences. Theory and control systems. 1999. No. 5. P. 127-134.

6 . Tsitolovsky L.E. (1997) A model of motivation with chaotic neuronal dynamics // Journ. of Biological Systems, 1997. V. 5. N.2, pp . 301-323. A description of the motivation minimization model proposed in this work is available on the website:

7. Burtsev M.S., Gusarev R.V., Redko V.G. Model of the evolutionary emergence of goal-directed adaptive behavior 1. The case of two needs. // Preprint IAM RAS, 2000, N 43. see also:

8 . Balkenius C. The roots of motivations. // In J.-A. Mayer, H. L. Roitblat and S. W. Wilson (eds.), From Animals to Animats II, MA: MIT Press., 1993. http://www.lucs.lu.se/People/Christian.Balkenius/Abstracts/ROM.html

9. Anokhin P.K. Systemic mechanisms of higher nervous activity. // M.: Nauka, 1979, 453 pp. Anokhin P.K. Essays on the physiology of functional systems. – M.: Medicine, 1975. Anokhin P.K. Fundamental issues of the general theory of functional systems // Principles of systemic organization of functions. – M.: Nauka, 1973. See also:

10. Shvyrkov V.B. Theory functional system as a methodological basis for the neurophysiology of behavior // Advances in physiological sciences. 1978. T. 9. No. 1.

11. Bongard M.M., Losev I.S., Smirnov M.S. Project model of behavior organization – Animal // Modeling of learning and behavior. – M.: Nauka, 1975. see also:

http://mbur.narod.ru/misc/bongard.htm

12. M. M. Bongard, I. S. Losev, V. V. Maksimov M. S. Smirnov. A formal language for describing situations using the concept of connection. // Modeling of learning and behavior. – M.: Nauka, 1975.

13. Vaintzweig M.N., Polyakova M.P. About modeling thinking. Article at COP 2002:

14. Turchin V.F. Phenomenon of science. Cybernetic approach to evolution. M.: Nauka, 1993. 295 p. (1st ed.). M.: ETS, 2000. 368s (2nd ed.). See also: http://www.refal.net/turchin/phenomenon/ http://www.refal.org/turchin/phenomenon/

15. Jantsch E. The self-organizing universe. Pergamon Press: Oxford etc, 1980. 340 p.

16. Heylighen F. & Joslyn C. (2001): " Cybernetics and Second Order Cybernetics ", in: R. A. Meyers (ed.), Encyclopedia of Physical Science & Technology, Vol. 4 (3rd ed.), (Academic Press, New York), pp. 155-170.

17. Sutton, R.S., & Barto, A.G. (1981). An adaptive network that constructs and uses an internal model of its world , Cognition and Brain Theory 4:217-246.

18. Kleene S. Mathematical logic. M.: Mir, 1973. 480 p.

19 . Klopf A.H. The hedonistic neuron: a theory of memory, learning, and intelligence. Hemisphere publishing corporation, Washington etc, 1982. 140 p.

20. Sutton, R.S., & Barto, A.G. (1981). Toward a modern theory of adaptive networks: Expectation and prediction , Psychological Review 88:135-140.

21. Valtsev V.B., Grigoriev I.R., Lavrov V.V., Cherkashin E.A. Heterogeneous networks and problems of modeling higher brain functions. Neuroinformatics Sat. tr. M. 2000,

p.52-56.

22. I. S. Losev, V. V. Maksimov. On the problem of generalization of initial situations. // Modeling of learning and behavior. – M.: Nauka, 1975.

In the chapter we became familiar with concepts such as artificial intelligence, machine learning and artificial neural networks.

In this chapter, I will describe in detail the artificial neuron model, talk about approaches to training the network, and also describe some well-known types of artificial neural networks that we will study in the following chapters.

Simplification

In the last chapter, I constantly talked about some serious simplifications. The reason for the simplifications is that no modern computers can fast simulate complex systems such as our brain. In addition, as I already said, our brain is full of various biological mechanisms that are not related to information processing.

We need a model for converting the input signal into the output signal we need. Everything else doesn't bother us. Let's start simplifying.

Biological structure → diagram

In the previous chapter, you realized how complex biological neural networks and biological neurons are. Instead of drawing neurons as tentacled monsters, let's just draw diagrams.

Generally speaking, there are several ways to graphically represent neural networks and neurons. Here we will depict artificial neurons as circles.

Instead of a complex interweaving of inputs and outputs, we will use arrows indicating the direction of signal movement.

Thus, an artificial neural network can be represented as a collection of circles (artificial neurons) connected by arrows.

Electrical signals → numbers

In a real biological neural network, an electrical signal is transmitted from the network inputs to the outputs. It may change as it passes through the neural network.

An electrical signal will always be an electrical signal. Conceptually, nothing changes. But what then changes? The magnitude of this electrical signal changes (stronger/weaker). And any value can always be expressed as a number (more/less).

In our artificial neural network model, we do not need to implement the behavior of the electrical signal at all, since nothing will depend on its implementation anyway.

We will supply some numbers to the network inputs, symbolizing the magnitude of the electrical signal if it existed. These numbers will move through the network and change in some way. At the output of the network we will receive some resulting number, which is the response of the network.

For convenience, we will still call our numbers circulating in the network signals.

Synapses → connection weights

Let us recall the picture from the first chapter, in which the connections between neurons - synapses - were depicted in color. Synapses can strengthen or weaken the electrical signal passing through them.

Let's characterize each such connection with a certain number, called the weight of this connection. The signal passing through a given connection is multiplied by the weight of the corresponding connection.

This is a key point in the concept of artificial neural networks, I will explain it in more detail. Look at the picture below. Now each black arrow (connection) in this picture corresponds to a certain number ​\(w_i \) ​ (the weight of the connection). And when the signal passes through this connection, its magnitude is multiplied by the weight of this connection.

In the above figure, not every connection has a weight simply because there is no space for labels. In reality, each ​\(i \) ​th connection has its own ​\(w_i \) ​th weight.

Artificial Neuron

We now move on to consider the internal structure of an artificial neuron and how it transforms the signal arriving at its inputs.

The figure below shows a complete model of an artificial neuron.

Don't be alarmed, there is nothing complicated here. Let's look at everything in detail from left to right.

Inputs, weights and adder

Each neuron, including artificial ones, must have some inputs through which it receives a signal. We have already introduced the concept of weights by which signals passing through the communication are multiplied. In the picture above, the weights are shown as circles.

The signals received at the inputs are multiplied by their weights. The signal of the first input ​\(x_1 \) ​ is multiplied by the weight ​\(w_1 \) ​ corresponding to this input. As a result, we get ​\(x_1w_1 \) ​. And so on until the ​\(n\) ​th input. As a result, at the last input we get ​\(x_nw_n \) ​.

Now all products are transferred to the adder. Just based on its name, you can understand what it does. It simply sums all the input signals multiplied by the corresponding weights:

\[ x_1w_1+x_2w_2+\cdots+x_nw_n = \sum\limits^n_(i=1)x_iw_i \]

Mathematical help

Sigma - Wikipedia

When it is necessary to briefly write down a large expression consisting of a sum of repeating/same-type terms, the sigma sign is used.

Let's consider the simplest recording option:

\[ \sum\limits^5_(i=1)i=1+2+3+4+5 \]

Thus, from below the sigma we assign the counter variable ​\(i \) ​ a starting value, which will increase until it reaches the upper limit (in the example above it is 5).

The upper limit can also be variable. Let me give you an example of such a case.

Let us have ​\(n \) stores. Each store has its own number: from 1 to ​\(n\) ​. Each store makes a profit. Let's take some (no matter what) ​\(i \) ​th store. The profit from it is equal to ​\(p_i \) ​.

\[ P = p_1+p_2+\cdots+p_i+\cdots+p_n \]

As you can see, all terms of this sum are of the same type. Then they can be briefly written as follows:

\[ P=\sum\limits^n_(i=1)p_i \]

In words: “Sum up the profits of all stores, starting with the first and ending with ​\(n\) ​-th.” In the form of a formula, it is much simpler, more convenient and more beautiful.

The result of the adder is a number called a weighted sum.

Weighted sum(Weighted sum) (​\(net \) ​) - the sum of the input signals multiplied by their corresponding weights.

\[ net=\sum\limits^n_(i=1)x_iw_i \]

The role of the adder is obvious - it aggregates all input signals (of which there can be many) into one number - a weighted sum that characterizes the signal received by the neuron as a whole. Another weighted sum can be represented as the degree of general excitation of the neuron.

Example

To understand the role of the last component of an artificial neuron - the activation function - I will give an analogy.

Let's look at one artificial neuron. His task is to decide whether to go on vacation at sea. To do this, we supply various data to its inputs. Let our neuron have 4 inputs:

  1. The cost of travel
  2. What's the weather like at sea?
  3. Current work situation
  4. Will there be a snack bar on the beach

We will characterize all these parameters as 0 or 1. Accordingly, if the weather at sea is good, then we apply 1 to this input. And so with all other parameters.

If a neuron has four inputs, then there must be four weights. In our example, weighting coefficients can be thought of as indicators of the importance of each input, influencing the overall decision of the neuron. We distribute the input weights as follows:

It is easy to see that the factors of cost and weather at sea (the first two inputs) play a very important role. They will also play a decisive role when the neuron makes a decision.

Let us supply the following signals to the inputs of our neuron:

We multiply the weights of the inputs by the signals of the corresponding inputs:

The weighted sum for such a set of input signals is 6:

\[ net=\sum\limits^4_(i=1)x_iw_i = 5 + 0 + 0 + 1 =6 \]

This is where the activation function comes into play.

Activation function

It’s quite pointless to simply submit a weighted amount as output. The neuron must somehow process it and generate an adequate output signal. It is for these purposes that the activation function is used.

It converts the weighted sum into a certain number, which is the output of the neuron (we denote the output of the neuron by the variable ​\(out \) ​).

For different types artificial neurons use a variety of activation functions. In general, they are denoted by the symbol ​\(\phi(net) \) ​. Specifying the weighted signal in parentheses means that the activation function takes the weighted sum as a parameter.

Activation function (Activation function)(​\(\phi(net) \) ​) is a function that takes a weighted sum as an argument. The value of this function is the output of the neuron (​\(out \) ​).

Single jump function

The simplest type of activation function. The output of a neuron can only be equal to 0 or 1. If the weighted sum is greater than a certain threshold ​\(b\) ​, then the output of the neuron is equal to 1. If lower, then 0.

How can it be used? Let's assume that we go to the sea only when the weighted sum is greater than or equal to 5. This means our threshold is 5:

In our example, the weighted sum was 6, which means the output signal of our neuron is 1. So, we are going to the sea.

However, if the weather at sea were bad and the trip was very expensive, but there was a snack bar and the work environment was normal (inputs: 0011), then the weighted sum would be equal to 2, which means the output of the neuron would be equal to 0. So, We're not going anywhere.

Basically, a neuron looks at a weighted sum and if it is greater than its threshold, then the neuron produces an output equal to 1.

Graphically, this activation function can be depicted as follows.

The horizontal axis contains the values ​​of the weighted sum. On the vertical axis are the output signal values. As is easy to see, only two values ​​of the output signal are possible: 0 or 1. Moreover, 0 will always be output from minus infinity up to a certain value of the weighted sum, called the threshold. If the weighted sum is equal to or greater than the threshold, then the function returns 1. Everything is extremely simple.

Now let's write this activation function mathematically. You've almost certainly come across the concept of a compound function. This is when we combine several rules under one function by which its value is calculated. In the form of a composite function, the single jump function will look like this:

\[ out(net) = \begin(cases) 0, net< b \\ 1, net \geq b \end{cases} \]

There is nothing complicated about this recording. The output of a neuron (​\(out \) ​) depends on the weighted sum (​\(net \) ​) as follows: if ​\(net \) ​ (weighted sum) is less than some threshold (​\(b \ ) ​), then ​\(out \) ​ (neuron output) is equal to 0. And if ​\(net \) ​ is greater than or equal to the threshold ​\(b \) ​, then ​\(out \) ​ is equal to 1 .

Sigmoid function

In fact, there is a whole family of sigmoid functions, some of which are used as activation functions in artificial neurons.

All these functions have some very useful properties, for which they are used in neural networks. These properties will become apparent once you see graphs of these functions.

So... the most commonly used sigmoid in neural networks is logistic function.

The graph of this function looks quite simple. If you look closely, you can see some resemblance to the English letter ​\(S \) ​, which is where the name of the family of these functions comes from.

And this is how it is written analytically:

\[ out(net)=\frac(1)(1+\exp(-a \cdot net)) \]

What is the parameter ​\(a \) ​? This is some number that characterizes the degree of steepness of the function. Below are logistic functions with different parameters ​\(a \) ​.

Let's remember our artificial neuron, which determines whether it is necessary to go to the sea. In the case of the single jump function, everything was obvious. We either go to the sea (1) or not (0).

Here the case is closer to reality. We are not completely sure (especially if you are paranoid) - is it worth going? Then using the logistic function as an activation function will result in you getting a number between 0 and 1. Moreover, the larger the weighted sum, the closer the output will be to 1 (but will never be exactly equal to it). Conversely, the smaller the weighted sum, the closer the neuron's output will be to 0.

For example, the output of our neuron is 0.8. This means that he believes that going to the sea is still worth it. If his output were equal to 0.2, then this means that he is almost certainly against going to the sea.

What remarkable properties does the logistics function have?

  • it is a “compressive” function, that is, regardless of the argument (weighted sum), the output signal will always be in the range from 0 to 1
  • it is more flexible than the single jump function - its result can be not only 0 and 1, but any number in between
  • at all points it has a derivative, and this derivative can be expressed through the same function

It is because of these properties that the logistic function is most often used as an activation function in artificial neurons.

Hyperbolic tangent

However, there is another sigmoid - the hyperbolic tangent. It is used as an activation function by biologists to create a more realistic model of a nerve cell.

This function allows you to get output values ​​of different signs (for example, from -1 to 1), which can be useful for a number of networks.

The function is written as follows:

\[ out(net) = \tanh\left(\frac(net)(a)\right) \]

In the above formula, the parameter ​\(a \) ​ also determines the degree of steepness of the graph of this function.

And this is what the graph of this function looks like.

As you can see, it looks like a graph of a logistic function. The hyperbolic tangent has all the useful properties that the logistic function has.

What have we learned?

Now you have a complete understanding of the internal structure of an artificial neuron. I'll bring it again short description his works.

A neuron has inputs. They receive signals in the form of numbers. Each input has its own weight (also a number). The input signals are multiplied by the corresponding weights. We get a set of “weighted” input signals.

The weighted sum is then converted activation function and we get neuron output.

Let us now formulate the most short description the work of a neuron - its mathematical model:

Mathematical model of an artificial neuron with ​\(n \) ​ inputs:

Where
​\(\phi \) ​ – activation function
\(\sum\limits^n_(i=1)x_iw_i \)​ – weighted sum, as the sum of ​\(n\) ​ products of input signals by the corresponding weights.

Types of ANN

We have figured out the structure of an artificial neuron. Artificial neural networks consist of a collection of artificial neurons. A logical question arises - how to place/connect these same artificial neurons to each other?

As a rule, most neural networks have a so-called input layer, which performs only one task - distributing input signals to other neurons. The neurons in this layer do not perform any calculations.

Single-layer neural networks

In single-layer neural networks, signals from the input layer are immediately fed to the output layer. It performs the necessary calculations, the results of which are immediately sent to the outputs.

A single-layer neural network looks like this:

In this picture, the input layer is indicated by circles (it is not considered a neural network layer), and on the right is a layer of ordinary neurons.

Neurons are connected to each other by arrows. Above the arrows are the weights of the corresponding connections (weighting coefficients).

Single layer neural network (Single-layer neural network) - a network in which signals from the input layer are immediately fed to the output layer, which converts the signal and immediately produces a response.

Multilayer neural networks

Such networks, in addition to the input and output layers of neurons, are also characterized by a hidden layer (layers). Their location is easy to understand - these layers are located between the input and output layers.

This structure of neural networks copies the multilayer structure of certain parts of the brain.

It is no coincidence that the hidden layer got its name. The fact is that only relatively recently methods for training hidden layer neurons were developed. Before this, only single-layer neural networks were used.

Multilayer neural networks have much greater capabilities than single-layer ones.

The work of hidden layers of neurons can be compared to the work of a large factory. The product (output signal) at the plant is assembled in stages. After each machine some intermediate result is obtained. Hidden layers also transform input signals into some intermediate results.

Multilayer neural network (Multilayer neural network) - a neural network consisting of an input, an output and one (several) hidden layers of neurons located between them.

Direct distribution networks

You can notice one very interesting detail in the pictures of neural networks in the examples above.

In all examples, the arrows strictly go from left to right, that is, the signal in such networks goes strictly from the input layer to the output layer.

Direct distribution networks (Feedforward neural network) (feedforward networks) - artificial neural networks in which the signal propagates strictly from the input layer to the output layer. The signal does not propagate in the opposite direction.

Such networks are widely used and quite successfully solve a certain class of problems: forecasting, clustering and recognition.

However, no one forbids the signal to go in the opposite direction.

Feedback networks

In networks of this type, the signal can also go in the opposite direction. What's the advantage?

The fact is that in feedforward networks, the output of the network is determined by the input signal and weighting coefficients for artificial neurons.

And in networks with feedback, the outputs of neurons can return to the inputs. This means that the output of a neuron is determined not only by its weights and input signal, but also by previous outputs (since they returned to the inputs again).

The ability of signals to circulate in a network opens up new, amazing possibilities for neural networks. Using such networks, you can create neural networks that restore or complement signals. In other words, such neural networks have the properties of short-term memory (like a person’s).

Feedback networks (Recurrent neural network) - artificial neural networks in which the output of a neuron can be fed back to its input. More generally, this means the ability to propagate a signal from outputs to inputs.

Neural network training

Now let's look at the issue of training a neural network in a little more detail. What it is? And how does this happen?

What is network training?

An artificial neural network is a collection of artificial neurons. Now let's take, for example, 100 neurons and connect them to each other. It is clear that when we apply a signal to the input, we will get something meaningless at the output.

This means we need to change some network parameters until the input signal is converted into the output we need.

What can we change in a neural network?

Change total artificial neurons are pointless for two reasons. Firstly, increasing the number of computing elements as a whole only makes the system heavier and more redundant. Secondly, if you gather 1000 fools instead of 100, they still won’t be able to answer the question correctly.

The adder cannot be changed, since it performs one strictly defined function - adding. If we replace it with something or remove it altogether, then it will no longer be an artificial neuron at all.

If we change the activation function of each neuron, we will get a neural network that is too heterogeneous and uncontrollable. In addition, in most cases, neurons in neural networks are of the same type. That is, they all have the same activation function.

There is only one option left - change connection weights.

Neural network training (Training)- search for such a set of weighting coefficients in which the input signal, after passing through the network, is converted into the output we need.

This approach to the term “neural network training” also corresponds to biological neural networks. Our brain is made up of huge amount neural networks connected to each other. Each of them individually consists of neurons of the same type (the activation function is the same). We learn by changing synapses - elements that strengthen / weaken the input signal.

However there is one more important point. If you train a network using only one input signal, then the network will simply “remember the correct answer.” From the outside it will seem that she “learned” very quickly. And as soon as you give a slightly modified signal, expecting to see the correct answer, the network will produce nonsense.

In fact, why do we need a network that detects a face in only one photo? We expect the network to be able generalize some signs and recognize faces in other photographs too.

It is for this purpose that they are created training samples.

Training set (Training set) - a finite set of input signals (sometimes together with the correct output signals) from which the network is trained.

After the network is trained, that is, when the network produces correct results for all input signals from the training set, it can be used in practice.

However, before launching a freshly baked neural network into battle, the quality of its work is often assessed on the so-called test sample.

Test sample (Testing set) - a finite set of input signals (sometimes together with the correct output signals) by which the quality of the network is assessed.

We understood what “network training” is – choosing the right set of weights. Now the question arises - how can you train a network? In the most general case, there are two approaches leading to different results: supervised learning and unsupervised learning.

Tutored training

The essence of this approach is that you provide a signal as an input, look at the network’s response, and then compare it with a ready-made, correct response.

Important point. Do not confuse the correct answers with the known solution algorithm! You can trace the face in the photo with your finger (correct answer), but you won’t be able to tell how you did it (well-known algorithm). The situation is the same here.

Then, using special algorithms, you change the weights of the neural network connections and again give it an input signal. You compare its answer with the correct one and repeat this process until the network begins to respond with acceptable accuracy (as I said in Chapter 1, the network cannot give unambiguously accurate answers).

Tutored training (Supervised learning) is a type of network training in which its weights are changed so that the network’s answers differ minimally from the already prepared correct answers.

Where can I get the correct answers?

If we want the network to recognize faces, we can create a training set of 1000 photos (input signals) and independently select faces from it (correct answers).

If we want the network to predict price increases/declines, then the training sample must be made based on past data. You can take as input signals certain days, general state market and other parameters. And the correct answers are the rise and fall of prices in those days.

It is worth noting that the teacher, of course, is not necessarily a person. The fact is that sometimes the network has to be trained for hours and days, making thousands and tens of thousands of attempts. In 99% of cases, this role is performed by a computer, or more precisely, a special computer program.

Unsupervised learning

Unsupervised learning is used when we do not have the correct answers to the input signals. In this case, the entire training set consists of a set of input signals.

What happens when the network is trained in this way? It turns out that with such “training” the network begins to distinguish classes of signals supplied to the input. In short, the network begins clustering.

For example, you are demonstrating candy, pastries and cakes to the network. You do not regulate the operation of the network in any way. You simply feed data about this object to its inputs. Over time, the network will begin to produce signals of three different types, which are responsible for the objects at the input.

Unsupervised learning (Unsupervised learning) is a type of network training in which the network independently classifies input signals. The correct (reference) output signals are not shown.

conclusions

In this chapter, you learned everything about the structure of an artificial neuron, as well as a thorough understanding of how it works (and its mathematical model).

Moreover, you now know about various types artificial neural networks: single-layer, multi-layer, as well as feedforward networks and networks with feedback.

You also learned about supervised and unsupervised network learning.

You already know the necessary theory. Subsequent chapters include consideration of specific types of neural networks, specific algorithms for their training, and programming practice.

Questions and tasks

You should know the material in this chapter very well, since it contains basic theoretical information on artificial neural networks. Be sure to achieve confident and correct answers to all the questions and tasks below.

Describe the simplifications of ANNs compared to biological neural networks.

1. The complex and intricate structure of biological neural networks is simplified and represented in the form of diagrams. Only the signal processing model is left.

2. The nature of electrical signals in neural networks is the same. The only difference is their size. We remove electrical signals, and instead use numbers indicating the magnitude of the transmitted signal.

The activation function is often denoted by ​\(\phi(net) \) ​.

Write down a mathematical model of an artificial neuron.

An artificial neuron with ​\(n \) ​ inputs converts an input signal (number) into an output signal (number) as follows:

\[ out=\phi\left(\sum\limits^n_(i=1)x_iw_i\right) \]

What is the difference between single-layer and multi-layer neural networks?

Single-layer neural networks consist of a single computational layer of neurons. The input layer sends signals directly to the output layer, which converts the signal and immediately produces the result.

Multilayer neural networks, in addition to input and output layers, also have hidden layers. These hidden layers carry out some internal intermediate transformations, similar to the stages of production of products in a factory.

What is the difference between feedforward networks and feedback networks?

Feedforward networks allow the signal to pass in only one direction - from inputs to outputs. Networks with feedback do not have these restrictions, and the outputs of neurons can be fed back into the inputs.

What is a training set? What is its meaning?

Before using the network in practice (for example, to solve current problems for which you do not have answers), you need to collect a collection of problems with ready-made answers, on which to train the network. This collection is called the training set.

If you collect too small a set of input and output signals, the network will simply remember the answers and the learning goal will not be achieved.

What is meant by network training?

Network training is the process of changing the weighting coefficients of the artificial neurons of the network in order to select a combination of them that converts the input signal into the correct output.

What is supervised and unsupervised learning?

When training a network with a teacher, signals are given to its inputs, and then its output is compared with a previously known correct output. This process is repeated until the required accuracy of answers is achieved.

If networks only supply input signals, without comparing them with ready-made exits, then the network begins to independently classify these input signals. In other words, it performs clustering of input signals. This type of learning is called unsupervised learning.

A neuron is an information processing unit in a neural network. The figure below shows a model of the neuron underlying artificial neural networks.

There are three main elements in this neuron model:

The neuron model imitates, to a first approximation, the properties of a biological neuron. The input of an artificial neuron receives a number of signals, each of which is the output of another neuron. Each input is multiplied by a corresponding weight proportional to the synaptic strength, and all products are summed to determine the neuron's activation level.

Although network paradigms are very diverse, almost all of them are based on this neuron model. Here, a variety of input signals, designated, are supplied to the artificial neuron. These input signals, collectively denoted by the vector , correspond to signals arriving at the synapses of a biological neuron. Each signal is multiplied by the corresponding weight and goes to the summing block designated . Each weight corresponds to the “strength” of one biological synaptic connection. The set of weights is collectively denoted by a vector. The summing block, corresponding to the body of the biological element, adds the weighted inputs algebraically, creating an output. Next, it enters the input of the activation function, determining the final signal of excitation or inhibition of the neuron at the output. This signal arrives at the synapses of the next neurons, etc.

The considered simple model of a neuron ignores many properties of its biological counterpart. For example, it does not take into account time delays that affect the dynamics of the system. Input signals immediately generate an output signal. And, more importantly, this neuron model does not take into account the effects of the frequency modulation function or the synchronizing function of the biological neuron, which some researchers consider crucial.

Despite these limitations, networks built from this neuron model exhibit properties that closely resemble a biological system. Only time and research will be able to answer the question of whether such coincidences are accidental or a consequence of the fact that the most important features of the biological prototype are correctly captured in this neuron model.