Neural networks презентация

Содержание

Слайд 2

Table of contents

The basic concepts of neural networks
Artificial neural networks.
The structure of

an artificial neuron.
Activation functions.
Basic paradigms of neural networks.
Fundamentals of learning and training samples.
Using neural networks in practice
Single layer neural networks
Rosenblatt's single layer perceptron.
Learning single layer neural networks.
Associative memory and its realization on single layer neural networks.
Using single layer neural networks for pattern recognition and time series forecasing.
Multilayer perceptrons
The structure of multilayer perceptrons
Back propagation of error.
Using multilayer perceptrons for pattern recognition and time series forecasing.

Слайд 3

Self-organizing maps
The principle of unsupervised learning.
Kohonen self-organizing maps.
Learning Kohonen networks.
Practical using

of Kohonen networks
Recurent neural networks
Neural networks with feedback.
Hopfield neural network.
Hamming neural network.
Training Hopfield and Hamming neural networks.
Practical using of Hopfield and Hamming neural networks.
Training and Testing
Training error and testing error.

Слайд 4

References

David Kriesel. A brief Introduction to Neural networks // http://www.dkriesel.com/en/science/neural_networks
Raul Rojas. Neural

Networks. A Systematic Introduction // http://www.inf.fu-berlin.de/inst/ag-ki/rojas_home/documents/1996/NeuralNetworks/neuron.pdf.
L.P.J. Veelenturf. Analysis and Application of Artificial Neural Networks // http://www.ru.lv/~peter/zinatne/ebooks/Analysis%20and%20Applications%20of%20Artificial%20Neural%20Networks.pdf
Artificial Neural Networks – Methodological Advances and Biomedical Applications // InTech.ORG

Слайд 5

The basic concepts of neural networks

Слайд 6

Questions for motivation discussion

What tasks are machines good at doing that humans are

not?
What tasks are humans good at doing that machines are not?
What tasks are both good at?
What does it mean to learn?
How is learning related to intelligence?
What does it mean to be intelligent?
Do you believe a machine will ever been intelligent?
If a computer were intelligent, how would you know?

Слайд 7

Types of learning

Knowledge acquisition from expert.
Knowledge acquisition from data:
Supervised learning – the system

is supplied with a set of training examples consisting of inputs and corresponding outputs, and is required to discover the relation or mapping between them.
Unsupervised learning – the system is supplied with a set of training examples consisting only of inputs. It is required to discover what appropriate outputs should be.

Слайд 8

Artificial Neural Network

An extremely simplified model of the human’s brain
Transforms inputs into the

best outputs (some neural networks are the universal function approximators).

Слайд 9

Artificial Neural Networks

Development of Neural Networks date back to the early 1940s.
It

experienced an upsurge in popularity in the late 1980s due to discovery of new techniques of NN training.
Some NNs are models of biological neural networks and some are not, but historically, much of the inspiration for the field of NNs came from the desire to produce artificial systems capable of sophisticated, perhaps intelligent, computations similar to those that the human brain routinely performs, and thereby possibly to enhance our understanding of the human brain.
Most NNs have some sort of training rule. In other words, NNs learn from the examples (as children learn to recognize dogs from examples of dogs) and exhibit some capability for generalization beyond the training data.

Слайд 10

ANN vs Computers

Computers have to be explicitly programmed
Analyze the problem to be solved.
Write

the code in a programming language.
Neural networks learn from the examples
No requirement of an explicit description of the problem.
No need for a programmer.
The neural computer adapts itself during a training period, based on examples of similar problems even without a desired solution to each problem. After sufficient training the neural computer is able to relate the problem data to the solutions, inputs to outputs, and it is then able to offer a viable solution to a brand new problem.

Слайд 11

ANN vs Computers

Digital Computers
Deductive Reasoning. We apply known rules to input data to

produce output.
Computation is centralized, synchronous, and serial.
Memory is literally stored, and location addressable.
Not fault tolerant. One transistor goes and it no longer works.
Exact.
Static connectivity.
Applicable if well-defined rules accessible with precise input data.

Neural Networks
Inductive Reasoning. We use given input and output data (training examples) to make a reasoning.
Computation is collective, asynchronous, and parallel.
Memory is distributed, internalized, short term and content addressable.
Fault tolerant, redundancy, and sharing of responsibilities.
Inexact.
Dynamic connectivity.
Applicable if rules are unknown or complicated, or if data are noisy or partial.

Слайд 12

Biological neuron

Слайд 13

Biological neuron

Many “neurons” co-operate to perform the desired function

Basic elements:
Axon
Dendrite
Synapse

Слайд 14

Artificial Neuron Structure

The output of a neuron is a function of the weighted

sum of the inputs plus a bias

Слайд 15

Common activation functions

Слайд 17

Examples of ANN topologies

Single layer ANN

Multilayer ANN

ANN with one recurrent layer

Слайд 18

Fundamentals of learning and training samples

The weights in a neural network are

the most important factor in determining its function.
A training set is a set of training patterns, which we use to train our neural net.
Training is the act of presenting the network with some sample data and modifying the weights to better approximate the desired function

Слайд 19

Fundamentals of learning and training samples

There are two main types of training
Supervised

Training
Supplies the neural network with inputs and the correct outputs (results).
We can estimate a error vector for certain input.
Response of the network to the inputs is measured. The weights are modified to reduce the difference between the actual and desired outputs
Unsupervised Training
The training set only consists of input patterns.
The neural network adjusts its own weights so that similar inputs cause similar outputs. The network identifies the patterns and differences in the inputs without any external assistance

Слайд 20

Fundamentals of learning and training samples

A training pattern is an input vector

p with the components x1, x2, . . . , xn whose desired output is known.
By entering the training pattern into the network we receive an output that can be compared with the desired output.
The set of training patterns is called P. It contains a finite number of ordered pairs (p, t) of training patterns with corresponding desired output t.

Слайд 21

Fundamentals of learning and training samples

Teaching input. Let j be an output neuron.

The teaching input tj is the desired and correct value j should output after the input of a certain training pattern.
Analogously to the vector p the teaching inputs t1, t2, . . . , tn of the neurons can also be combined into a vector t. This vector always refers to a specific training pattern p and contained in the set P of the training patterns.

Слайд 22

Fundamentals of learning and training samples

Error vector. For several output neurons Ω1,Ω2, .

. . ,ΩO the difference between output vector and teaching input under a training input p is referred to as error vector.

Слайд 23

Fundamentals of learning

Let P be the set of training patters. In learning procedure

we realize finite number of iterations or epochs.
Epoch – single presentation of the entire data to the neural network. Typically many epochs are required to train the neural network
Iteration - the process of providing the network with an single input and updating the network's weights

Слайд 24

General learning procedure

Let P be the set of n training patters pn
For

i=1 to n
begin
We calculate NN output vector yi for the training pattern pi.
We compare yi with desired output ti. Then we calculate the error of output and make modification of weights.
end
If total error for the training set P more than some threshold then go to the step 2

Слайд 25

Using training samples

We have to divide the set of training samples into two

subsets:
one training set really used to train;
one verification set to test our progress of learning.
The usual division relations are, 70% for training data and 30% for verification data (randomly chosen).
We can finish the training process when the network provides the good results on the training data as well as on the verification data.

Слайд 26

Learning curve

The learning curve indicates the progress of the error, which can be

determined in various ways. This curve can indicate whether the network is progressing or not.

Слайд 27

Error measurement

Let Ω be the output neuron and O be the set of

output neurons.
The specific error Errp is based on a single training sample.

The total error Err is based on all training samples.

Слайд 28

When do we stop learning?

Generally, the training process is stopped when the user

in front of the learning computer "thinks" the error is small enough.

Слайд 29

Using neural networks in practice (discussion)

Classification
in marketing: consumer spending pattern classification
In

defence: radar and sonar image classification
In medicine: ultrasound and electrocardiogram image classification, EEGs, medical diagnosis
Recognition and identification
In general computing and telecommunications: speech, vision and handwriting recognition
In finance: signature verification and bank note verification
Assessment
In engineering: product inspection monitoring and control
In defence: target tracking
In security: motion detection, surveillance image analysis and fingerprint matching
Forecasting and prediction
In finance: foreign exchange rate and stock market forecasting
In agriculture: crop yield forecasting
In marketing: sales forecasting
In meteorology: weather prediction

Слайд 30

Single layer neural networks

Слайд 31

Single layer network with binary threshold activation function

Matrix form

Слайд 32

Single layer network with binary threshold activation function

Слайд 33

Practice with single layer neural network

Performing a calculations in single layer neural

networks with using direct and matrix form. Using various activation functions.
Using single layer neural networks with binary threshold activation function as linear classifier. Adjusting the linear classifier based on training samples.

Слайд 34

Hebbian learning rule

Introduced by Donald Hebb in his 1949 book “The Organization of Behavior”.
Describes a

basic mechanism for synaptic plasticity

Слайд 35

Hebbian learning rule (matrix form)

Слайд 36

Practice with hebbian learning rule
Construction the neural network based on hebbian learning rule

for modeling OR logical operator

Слайд 37

Delta rule (Widrow-Hoff rule)

The delta rule is a gradient descent learning rule for updating the weights of

the inputs to artificial neurons in single-layer neural network
The goal is to minimize the error between the actual outputs and the target outputs in the training data
For each (input/output) training pair, the delta rule determines the direction you need to adjust wij  to reduce the error for that training pair.
Derivatives are used for teaching

Слайд 38

Delta rule (Widrow-Hoff rule)

ADALINE (ADAptive LINear Element) network

Слайд 39

Delta rule (Widrow-Hoff rule)

Gradient descent method: find the steepest way down the slope from where you

are, and take a step in that direction

Слайд 40

Delta rule algorithm

Define 0Initialize the weights with some small random

value
Take input pattern and calculate output vector.
Modify weights and bias according delta rule.
Do steps 3-4 until E

Слайд 41

Linear classifiers

Слайд 42

Practice with delta rule

Construction the ADALINE neural network (linear classifier with minimum

error value) based on given training patterns.

Слайд 43

Rosenblatt's single layer perceptron

The perceptron is an algorithm for supervised classification of an

input into one of several possible non-binary outputs.
It is a type of linear classifier.
Was invented in 1957 by Frank Rosenblatt as a machine for image recognition.

Слайд 44

Rosenblatt's single layer perceptron

Learning rule

Слайд 45

Rosenblatt's learning algorithm

Initialise the weights and the threshold. Weights may be initialised to

0 or to a small random value.
Take input pattern x from X and calculate output vector y from Y.
If yi=tj then wij will not change.
If yi≠tj then wij(t+1) = wij (t) + α xi tj
Do steps 2-4 until yi=tj for whole training set

Слайд 46

Rosenblatt's single layer perceptron

It was quickly proved that perceptrons could not be trained

to recognize many classes of patterns.
It is linear classifier. For example, it is impossible for these classes of network to learn an XOR function.

Слайд 47

Practice with Rosenblatt's perceptron

Construction the linear classifier (Rosenblatt’s neural network perceptron) based on

given training patterns.

Слайд 48

Associative memory

Associative memory (computer science) - a data-storage device in which a location

is identified by its informational content rather than by names, addresses, or relative positions, and from which the data may be retrieved. This memory enable one to retrieve a piece of data from only a tiny sample of itself.
Associative memory (psychology) - recalling a previously experienced item by thinking of something that is linked with it, thus invoking the association

Слайд 49

Associative memory

Autoassociative memories are capable of retrieving a piece of data upon presentation

of only partial information from that piece of data
Heteroassociative memories can recall an associated piece of datum from one category upon presentation of data from another category.

Слайд 50

Autoassociative memory based on sign activation function

Neural network structure:
Number of neurons in the

input layer = Number of neurons in the output layer

Activation function

Learning rule
(adopted hebbian rule)

Example:

Слайд 51

Practice with autoassociative memory

Realization of the associative memory based on sign activation function.
Working

with multiple patterns.
Recognition of the original and noisy patterns.
Investigation of the properties and constraints of the associative memory based on sign activation function.

Слайд 52

Using single layer neural networks for time series forecasting

A time series - sequence

of data points, measured typically at points in time spaced at uniform time intervals

Слайд 53

Using single layer neural networks for time series forecasting

Training samples

Слайд 54

Practice with time series forecasting

Using ADALINE neural networks for currency forecasting:
Creation the training

set from the raw data (www.val.ru).
Learning the ADALINE.
Training ADALINE network with using delta rule and estimation the error.

Слайд 55

Multilayer perceptron

Слайд 56

Multilayer perceptron

A multilayer perceptron (MLP) is a feed forward artificial neural network model that maps sets of input

data onto a set of appropriate outputs.
Consists of multiple layers (input, output, one or several hidden layers) of nodes in a directed graph, with each layer fully connected to the next one.
Neurons with a nonlinear activation function.
Utilizes a supervised learning technique called backpropagation of error.

Typical structure

Слайд 57

Multilayer perceptron

Structure (2 hidden layers)

Calculation the output Y for input vector X

Слайд 58

Multilayer perceptron

Activation function is not a threshold
Usually a sigmoid function
Function approximator
Not limited to

linear problems
Information flows in one direction
The outputs of one layer act as inputs to the next layer

Слайд 59

Classification ability

A single layer network can only find a linear discriminant function.
It

can divide the input space by means of hyperplane (straight lines in two-dimensional space)

Слайд 60

Classification ability

Universal Function Approximation Theorem
MLP with one hidden layer can approximate

arbitrarily closely every continuous function that maps intervals of real numbers to some output interval of real numbers
f:[0,1]n->[0,1]
2n+1 neurons in hidden layer.
 Can form single convex
decision regions
One hidden layer is sufficient
for the large majority of problems

Слайд 61

Classification ability

Any function can be approximated to arbitrary accuracy by a network with

two hidden layers
MLP with two hidden layers can classify sets of any form. It can form arbitrary disjoint decision regions

Слайд 62

Backpropagation algorithm

D. Rumelhart, G. Hinton, R. Williams (1986)
Most common method of obtaining the

weights in the multilayer perceptron
A form of supervised training
The basic backpropagation algorithm is based on minimizing the error of the network using the derivatives of the error function
Backpropagation of error generalizes the delta rule

Слайд 63

Basic steps

Forward propagation of a training pattern's input through the neural network in

order to generate the propagation's output activations.
Backward propagation of the output’s error through the neural network using the training pattern target in order to generate the deltas of all output and hidden neurons.

Слайд 64

Backpropagation

Слайд 65

Backpropagation

We use gradient descent method for minimizing the error

Слайд 66

Backpropagation

Theorem. For any hidden layer i of the neural network, error of the

neuron i calculates by recursive way through the errors of neurons of the next layer j.

where m – number of neurons in the next layer j
wij – weights between neuron i and neurons in the next layer j
Sj – weighted sum for the neuron j in next layer.
Proof

Слайд 67

Backpropagation

Theorem. We can calculate derivatives of error E through the weights w and

bias T by following way.

Proof

Слайд 68

Backpropagation

Backpropagation rule

Слайд 69

Backpropagation algorithm

Define the training speed α (0<α<1) and desired minimal error Em
Initialize the

weights and biases by random way.
Take consequently all input patterns x from X.
Calculate output vector y by following way
Realize backpropogation shceme by following way
Modify weights and biases by following way

Слайд 70

Backpropagation algorithm

4. Calculate overall error for all patterns
5. If E>Em then go to

the step 3.

Слайд 71

Practice. Calculation delta-rule expressions for various activation functions

Слайд 72

Some problems

The learning rate is important
Too small
Convergence extremely slow
Too large
May not converge

The result

may converge to a local minimum.

Possible decision:
Using adaptive learning rate

Слайд 73

Some problems

Overfitting

The number of hidden neurons is very important, it defines the complexity

of the decision boundary:
Too few
Underfit the data – it does not have enough free parameters to fit the training data well.
Too many
Overfit the data – NN learns the insignificant details
Try different number and use validation set to choose the best one.
Start small and increase the number until satisfactory results are obtained.

Слайд 74

What constitutes a “good” training set?
Samples must represent the general population
Samples must contain

members of each class
Samples in each class must contain a wide range of variations or noise effect

Слайд 75

Practice with multilayer perceptron

Using MLP for noisy digits recognition &
Using MLP for time

series forecasting.
- Training set preparation.
- MLP learning in Deductor software.
- Estimation the error.

Слайд 76

Recurrent neural networks

Capable to influence to themselves by means of recurrences, e.g. by

including the network output in the following computation steps.
Hopfield neural network
Hamming neural network

Слайд 77

Hopfield network

1. Invented by John Hopfield in 1982.
2. Content-addressable memory with binary threshold nodes (-1,1 or 0,1)
3. wij=wji,

wii=0

Слайд 78

Hopfield network

Слайд 79

Hopfield network as associative memory

Слайд 80

Using hopfield network as associative memory

Слайд 81

Hopfield network as associative memory

Take noisy pattern y
Realize iterations
Until we will not reach

stable state (attractor)

Слайд 83

Practice with Hopfield network

Realization of the associative memory based on Hopfield Neural Network
Working

with multiple patterns.
Recognition of the original and noisy patterns.
Investigation of the properties and constraints of the associative memory based on Hopfield network.

Слайд 84

Hamming network

R. Lippman (1987)
Hamming network is two-network bipolar classifier. The first layer is

single-layer perceptron. It calculates hamming distance between the vectors. The second network is Hopfield network.

Слайд 85

Hamming network

Слайд 86

Hamming network working algorithm

Define weights wij, Tj
Get input pattern and initialize Hopfield weights
Make

iterations in Hopfield network until we get stable output.
Take output neuron with 1 value.

Слайд 87

Self-organizing maps

Слайд 88

Self-organizing maps

Unsupervised Training
The training set only consists of input patterns.
The neural network adjusts

its own weights so that similar inputs cause similar outputs. The network identifies the patterns and differences in the inputs without any external assistance

Слайд 89

Self-organizing maps (SOM)

A self-organizing map (SOM) is a type of artificial neural networkA self-organizing map (SOM) is a

type of artificial neural network that is trained using unsupervised learning to produce a low-dimensional (typically two-dimensional), discretized representation of the input space of the training samples, called a map.
Self-organizing maps are different from other artificial neural networks in the sense that they use a neighborhood function to preserve the topological properties of the input space.
The model was first described as an artificial neural network by the Finnish professor Teuvo Kohonen. 

Слайд 90

Self-organizing maps

We only ask which neuron is active at the moment.
We are not

interested in the exact output of the neuron but in knowing which neuron provides output.
These networks widely used for clustering
SOMs (like our brain) decide the task of mapping a high-dimensional input (N dimensions) onto areas in a low-dimensional grid of cells (G dimensions).

Слайд 92

Scheme of training of self-organizing map

Слайд 93

Competitive learning

Competitive learning is a form of unsupervised learning in artificial neural networks, in which nodes compete

for the right to respond to a subset of the input data

Слайд 94

Competitive learning

Слайд 95

Vector quantization

It works by dividing a large set of points (vectors) into groups

having approximately the same number of points closest to them. Each group is represented by its centroid point, as in k-means and some other clustering algorithms.

Слайд 96

Vector quantization

Choose random weights from [0;1].
t=1
Take all input patterns Xl,l=1,L
t=t+1

Applications:
data compression
pattern recognition


Video codecs
QuickTime
Cinepak
Indeo etc.
Audio codecs
Ogg Vorbis
TwinVQ
DTS etc.

Слайд 97

Kohonen Maps

Слайд 98

Kohonen maps

Слайд 99

Kohonen maps learning procedure

Choose random weights from [0;1].
t=1
Take input pattern Xl and calculate

Dij=(Xl-Wij),where i,j=1,m
Detect winner neuron D(k1,k2)=min(Dij)
Calculate for every output neuron
Modify weights by following way
Repeat steps 3-6 for all input patterns

Слайд 100

Training and Testing

Слайд 101

Training

The goal is to achieve a balance between correct responses for the training

patterns and correct responses for new patterns.

Слайд 102

Training and Verification

The set of all known samples is broken into two independent

sets
Training set
A group of samples used to train the neural network
Testing set
A group of samples used to test the performance of the neural network
Used to estimate the error rate

Слайд 103

Verification

Provides an unbiased test of the quality of the network
Common error is

to “test” the neural network using the same samples that were used to train the neural network.
The network was optimized on these samples, and will obviously perform well on them
Doesn’t give any indication as to how well the network will be able to classify inputs that weren’t in the training set

Слайд 104

Summary (Discussion)

Artificial neural networks are inspired by the learning processes that take place

in biological systems.
Artificial neurons and neural networks try to imitate the working mechanisms of their biological counterparts.
Learning can be perceived as an optimisation process.
Biological neural learning happens by the modification of the synaptic strength. Artificial neural networks learn in the same way.
The synapse strength modification rules for artificial neural networks can be derived by applying mathematical optimisation methods.

Слайд 105

Summary

Learning tasks of artificial neural networks can be reformulated as function approximation tasks.
Neural

networks can be considered as nonlinear function approximating tools (i.e., linear combinations of nonlinear basis functions), where the parameters of the networks should be found by applying optimisation methods.
The optimisation is done with respect to the approximation error measure.
In general it is enough to have a single hidden layer neural network (MLP or other) to learn the approximation of a nonlinear function.

Слайд 106

Questions and Comments

Имя файла: Neural-networks.pptx
Количество просмотров: 99
Количество скачиваний: 0