1、Artificial Neural Networks,Brian Talecki CSC 8520 Villanova University,Articifial Intelligence,ANN - Artificial Neural Network,A set of algebraic equations and functions which determine the best output given a set of inputs. An artificial neural network is modeled on a very simplified version of the
2、 a human neuron which make up the human nervous system. Although the brain operates at 1 millionth the speed of modern computers, it functions faster than computers because of the parallel processing structure of the nervous system.,Human Nerve Cell,picture from: G5AIAI Introduction to AI by Graham
3、Kendall www.cs.nott.ac.uk/gxk/courses/g5aiai,At the synapse the nerve cell releases a chemical compounds called neurotransmitters, which excite or inhibit a chemical / electrical discharge in the neighboring nerve cells. The summation of the responses of the adjacent neurons will elicit the appropri
4、ate response in the neuron.,Brief History of ANN,McCulloch and Pitts (1943) designed the first neural network Hebb (1949) who developed the first learning rule. If two neurons were active at the same time then the strength between them should be increased. Rosenblatt (1958) introduced the concept of
5、 a perceptron which performed pattern recognition. Widrow and Hoff (1960) introduced the concept of the ADALINE (ADAptive Linear Element) . The training rule was based on the idea of Least-Mean-Squares learning rule which minimizing the error between the computed output and the desired output. Minsk
6、y and Papert (1969) stated that the perceptron was limited in its ability to recognize features that were separated by linear boundaries. “Neural Net Winter” Kohonen and Anderson independently developed neural networks that acted like memories. Webros(1974) developed the concept of back propagation
7、of an error to train the weights of the neural network. McCelland and Rumelhart (1986) published the paper on back propagation algorithm. “Rebirth of neural networks”. Today - they are everywhere a decision can be made.,Source : G5AIAI - Introduction to Artificial Intelligence Graham Kendall:,Basic
8、Neural Network,Inputs normally a vector of measured parameters Bias may/may not be added f() transfer or activation function Outputs = f( W p + b),f(),W,Outputs,Inputs,b - Bias, Wp +b,T,Activation Functions,Source: Supervised Neural Network Introduction CISC 873. Data Mining Yabin Meng,Log Sigmoidal
9、 Function,Source: Artificial Neural Networks Colin P. Fahey http:/ Limit Function,1.0,-1.0,x,y,Log Sigmoid and Derivative,Source : The Scientist and Engineers Guide to Digital Signal Processing by Steven Smith,Derivative of the Log Sigmoidal Function,s(x) = (1 + e ) s(x) = -(1+e ) * (-e ) = e * (1+
10、e )= ( e ) * ( 1 ) (1+ e ) ( 1 + e )= (1 + e 1) * ( 1 )( 1+ e ) ( 1 + e )= (1 - ( 1 ) ) * ( 1 ) (1+ e ) (1 + e ) s(x) = (1-s(x) * s(x),-1,-x,-2,-x,-x,-x,-2,-x,-x,-x,-x,-x,-x,-x,-x,-x,Derivative is important for the back error propagation algorithm used to train multilayer neural networks.,Example :
11、Single Neuron,Given : W = 1.3, p = 2.0, b = 3.0Wp + b = 1.3(2.0) + 3.0 = 5.6 Linear:f(5.6) = 5.6Hard limitf(5.6) = 1.0Log Sigmoidalf(5.6) = 1/(1+exp(-5.6)= 1/(1+0.0037)= .9963,Simple Neural Network,One neuron with a linear activation function = Straight Line Recall the equation of a straight Line :
12、y = mx +bm is the slope (weight), b is the y-intercept (bias).,Bad,Good,Decision Boundary,p2,p1,Mp1 + b = p2,Mp1 + b p2,Perceptron Learning,Extend our simple perceptron to two inputs and hard limit activation function,F(),W,bias,Output,W1,W2,o = f ( W p + b) W is the weight matrix p is the input vec
13、tor o is our scalar output,p1,p2,Hard limit function,T,Rules of Matrix Math,Addition/Subtraction1 2 3 9 8 7 10 10 104 5 6 +/- 6 5 4 = 10 10 107 8 9 3 2 1 10 10 10Multiplication by a scalar Transposea 1 2 = a 2a 1 = 1 23 4 3a 4a 2Matrix Multiplication2 4 5 = 18 , 5 2 4 = 10 202 2 4 8,T,Data Points fo
14、r the AND Function,q1 = 0 , o1 = 00q2 = 1 , o2 = 00q3 = 0 , o3 = 01q4 = 1 , o4 = 11,Truth Table P1 P2 O,0 0 00 1 01 0 01 1 1,Weight Vector and the Decision Boundary,W = 1.01.0,Magnitude and Direction,Decision Boundary is the line where W p = b or W p b = 0,T,T,W p b,W p b,T,T,As we adjust the weight
15、s and biases of the neural network, we change the magnitude and direction of the weight vector or the slope and intercept of the decision boundary,Perceptron Learning Rule,Adjusting the weights of the Perceptron Perceptron Error : Difference between the desired and derived outputs. e = Desired Deriv
16、edWhen e = 1W new = Wold + pWhen e = -1W new = Wold - pWhen e = 0W new = WoldSimplifingW new = Wold + * epb new = bold + e is the learning rate ( = 1 for the perceptron).,AND Function Example,Start with W1 = 1, W2 = 1, and b = -1W p + b = t - a = e1 1 0 + -1 = 0 - 0 = 0 N/C01 1 0 + -1 = 0 - 1 = -1 1
17、 1 0 1 + -2 = 0 - 0 = 0 N/C01 0 1 + -2 = 1 - 0 = 11,T,W p + b = t - a = e2 1 0 + -1 = 0 - 0 = 0 N/C02 1 0 + -1 = 0 - 1 = -1 1 2 0 1 + -2 = 0 - 1 = -101 0 1 + -3 = 1 - 0 = 11,T,W p + b = t - a = e2 1 0 + -2 = 0 - 0 = 0 N/C02 1 0 + -2 = 0 - 0 = 0 N/C1 2 1 1 + -2 = 0 - 1 = -1 01 1 1 + -3 = 1 - 0 = 11,T
18、,W p + b = t - a = e2 2 0 + -2 = 0 - 0 = 0 N/C02 2 0 + -2 = 0 - 1 = -11 2 1 1 + -3 = 0 - 0 = 0 N/C02 1 1 + -3 = 1 - 1 = 0 N/C1,T,W p + b = t - a = e2 1 0 + -3 = 0 - 0 = 0 N/C02 1 0 + -3 = 0 - 0 = 0 N/C1 Done !,T,2,f(),1,Hardlim(),p1,p2,-3,XOR Function,Truth TableX Y Z = (X and not Y) or (not X and Y
19、)0 0 00 1 11 0 11 1 0,1,0,No single decision boundary can separate the favorable and unfavorable outcomes.,z,x,y,We will need a more complicated neural net to realize this function,Circuit Diagram,XOR Function Multilayer Perceptron,W5,W6,W1,W3,W2,W4,f1(),f1(),f(),z,b2,b11,b12,x,y,Z = f (W5*f1(W1*x +
20、 W4*y+b11) +W6*f1(W2*x + W3*y+b12)+b2),Weights of the neural net are independent of each other, so that we can compute the partial derivatives of z with respect to the weights of the network.,i.e. z / W1, z / W2, z / W3, z / W4, z / W5, z / W6,Back Propagation Diagram,Neural Networks and Logistic Re
21、gression by Lucila Ohno-Machado Decision Systems Group, Brigham and Womens Hospital, Department of Radiology,Back Propagation Algorithm,This algorithm to train Artificial Neural Networks (ANN) depends to two basic concepts:a) Reduced the Sum Squared Error, SSE, to an acceptable value.b) Reliable dat
22、a to train your network under your supervision.Simple case : Single input no bias neural net.,z,x,W1,n1,f1,W2,a1,n2,T = desired output,f2,BP Equations,n1 = W1 * x a1 = f1(n1) = f1(W1 * x) n2 = W2 * a1 = W2 * f1(n1) = W2 * f1(W1 * x) z = f2(n2) = f2(W2 * f1(W1 * x) SSE = (z T) Lets now take the parti
23、al derivatives SSE/ W2 = (z - T) * (z - T)/ W2 = (z T) * z/ W2 = (z - T) * f2(n2)/W2Chain Rule f2(n2)/W2 = (f2(n2)/n2)* (n2/W2) = (f2(n2)/n2)* a1 SSE/ W2 = (z - T) * (f2(n2)/n2)* a1 Define to our learning rate (0 1, typical = 0.2) Compute our new weight:W2(k+1) = W2(k) - (SSE/ W2) = W2(k) - (z - T)
24、* (f2(n2)/n2)* a1),2,Sigmoid function:f2(n2)/n2 = f2(n2)(1 f2(n2) = z(1 z) Therefore:W2(k+1) = W2(k) - (z - T) * ( z(1 z) )* a1)Analysis for W1n1 = W1 * xa1 = f1(W1*x) n2 = W2 * f1(n1) = W2 * f1(W1 * x)SSE/ W1 = (z - T) * (z -T )/ W1 = (z - T) * z/ W1 = (z - T) * f2(n2)/W1f2(n2)/W1 = (f2(n2)/n2)* (n
25、2/W1) - Chain Rulen2/W1 = W2 * (f1(n1)/W1)= W2 * (f1(n1)/n1) * (n1/W1) - Chain Rule= W2 * (f1(n1)/n1) * x SSE/ W1 = (z - T ) * (f2(n2)/n2)* W2 * (f1(n1)/n1) * xW1(k+1) = W1(k) - (z - T ) * (f2(n2)/n2)* W2 * (f1(n1)/n1) * x)f2(n2)/n2 = z (1 z) and f1(n1)/n1 = a1 ( 1 a1),Gradient Descent,Local minimum
26、,Global minimum,Error,Training time,Neural Networks and Logistic Regression by Lucila Ohno-Machado Decision Systems Group, Brigham and Womens Hospital, Department of Radiology,2-D Diagram of Gradient Descent,Source : Back Propagation algorithm by Olena Lobunets www.essex.ac.uk/ccfea/Courses/ worksho
27、ps03-04/Workshop4/Workshop%204.ppt,Learning by Example,Training Algorithm: backpropagation of errors using gradient descent training. Colors: Red: Current weights Orange: Updated weights Black boxes: Inputs and outputs to a neuron Blue: Sensitivities at each layer,Source : A Brief Overview of Neural
28、 Networks Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch campus.umr.edu/smartengineering/ EducationalResources/Neural_Net.ppt,First Pass,1,0.6508,0.6508,Error=1-0.6508=0.3492,G3=(1)(0.3492)=0.3492,G2= (0.6508)(1-0.6508)(0.3492)(0.5)=0.0397,G1= (0.6225)(1-0.6225)(0.0397)(0.5)(2)=
29、0.0093,Gradient of the neuron= G =slope of the transfer function(weight of the neuron to the next neuron) (output of the neuron),Gradient of the output neuron = slope of the transfer function error,Weight Update 1,New Weight=Old Weight + (learning rate)(gradient)(prior output),0.5+(0.5)(0.3492)(0.65
30、08),0.6136,0.5124,0.5124,0.5124,0.6136,0.5124,0.5047,0.5047,0.5+(0.5)(0.0397)(0.6225),0.5+(0.5)(0.0093)(1),Source : A Brief Overview of Neural Networks Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch campus.umr.edu/smartengineering/ EducationalResources/Neural_Net.ppt,Second Pass
31、,1,0.8033,0.8033,Error=1-0.8033=0.1967,G3=(1)(0.1967)=0.1967,G2= (0.6545)(1-0.6545)(0.1967)(0.6136)=0.0273,G1= (0.6236)(1-0.6236)(0.5124)(0.0273)(2)=0.0066,Source : A Brief Overview of Neural Networks Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch campus.umr.edu/smartengineering
32、/ EducationalResources/Neural_Net.ppt,Weight Update 2,New Weight=Old Weight + (learning rate)(gradient)(prior output),0.6136+(0.5)(0.1967)(0.6545),0.6779,0.5209,0.5209,0.5209,0.6779,0.5209,0.508,0.508,0.5124+(0.5)(0.0273)(0.6236),0.5047+(0.5)(0.0066)(1),Source : A Brief Overview of Neural Networks R
33、ohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch campus.umr.edu/smartengineering/ EducationalResources/Neural_Net.ppt,Third Pass,0.508,0.5209,0.6779,0.6779,0.508,0.5209,0.5209,0.5209,1,0.6504,0.6504,0.6243,0.6243,0.8909,0.8909,Source : A Brief Overview of Neural Networks Rohit Dua,
34、 Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch campus.umr.edu/smartengineering/ EducationalResources/Neural_Net.ppt,Weight Update Summary,W1: Weights from the input to the input layer W2: Weights from the input layer to the hidden layer W3: Weights from the hidden layer to the output laye
35、r,Source : A Brief Overview of Neural Networks Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch campus.umr.edu/smartengineering/ EducationalResources/Neural_Net.ppt,ECG Interpretation,Neural Networks and Logistic Regression by Lucila Ohno-Machado Decision Systems Group, Brigham an
36、d Womens Hospital, Department of Radiology,Other Applications of ANN,Lip Reading Using Artificial Neural NetworkAhmad Khoshnevis, Sridhar Lavu, Bahar Sadeghiand Yolanda Tsang ELEC502 Course Projectwww-dsp.rice.edu/lavu/research/doc/502lavu.psAI Techniques in Power Electronics and Drives Dr. Marcelo
37、G. Simes Colorado School of Mines egweb.mines.edu/msimoes/tutorialCar Classification with Neural NetworksKoichi Sato & Sangho Parkhercules.ece.utexas.edu/course/ ee380l/1999sp/present/carclass.ppt Face Detection and Neural NetworksTodd Wittmanwww.ima.umn.edu/whitman/faces/face_detection2.pptA Neural
38、 Network for Detecting and Diagnosing Tornadic CirculationsV Lakshmanan, Gregory Stumpf, Arthur Witt www.cimms.ou.edu/lakshman/Papers/mdann_talk.ppt,Bibliography,A Brief Overview of Neural NetworksRohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunschcampus.umr.edu/smartengineering/ Edu
39、cationalResources/Neural_Net.pptNeural Networks and Logistic RegressionLucila Ohno-MachadoDecision Systems Group, Brigham and Womens Hospital,Department of Radiologydsg.harvard.edu/courses/hst951/ppt/hst951_0320.ppt G5AIAI Introduction to AI by Graham KendallSchooll of Computer Science and IT , Univ
40、ersity of Nottingham www.cs.nott.ac.uk/gxk/courses/g5aiai The Scientist and Engineers Guide to Digital Signal Processing Steven W. Smith, Ph.D. California Technical Publishing Neural Network DesignMartin Hagen, Howard B. Demuth, and Mark BealeCampus Publishing Services, Boulder Colorado 80309-0036ECE 8412 lectures notes by Dr. Anthony ZygmontDepartment of Electrical EngineeringVillanova University January 2003Supervised Neural Network IntroductionCISC 873. Data MiningYabin Meng mengcs.queensu.ca,