Introduction: The Perceptron - MIT

2y ago

25 Views

2 Downloads

381.76 KB

10 Pages

Last View : 1m ago

Last Download : 3m ago

Upload by : Audrey Hope

Report this link

Download PDF

Transcription

Introduction: The PerceptronHaim Sompolinsky, MITOctober 4, 20131Perceptron ArchitectureThe simplest type of perceptron has a single layer of weights connecting theinputs and output.PNFormally, the perceptron is defined by y sign( i 1 wi xi ) ory sign(wT x(1) )where w is the weight vector and is the threshold. Unless otherwise stated, wewill ignore the threshold in the analysis of the perceptron (and other topics), because we can instead view the threshold as an additional synaptic weight, whichxis given the constant input 1. This is so because wT x [wT , 1]. Figure 1: Perceptron Networkx1.x2W1W2y1xNWN

Perceptron’s decision surface. It is easy to visualize the action of theperceptron in geometric terms because w and x have the same dimensionality,N. W --Figure 2 shows the surface in the input space, that divide the input space intotwo classes, according to their label. This is an example of a decision surface ofa machine that outputs dichotomies. In this case, it is just a hyperplane drawnin input space and passes through the origin. The alignment of the hyperplaneis perpendicular to the vector w . We will some time identify the plane by itsassociated weight vector w.Any set of labled points that can be separated by a hyperplane (through theorigin) is said to be a linearly separable dichotomy. However, there are manyinteresting simple dichotomies that cannot be realized by a perceptron; theseare non-linearly separable problem. A famous example is the XOR.2Perceptron’s Capacity: Cover Counting TheoremBefore we discuss learning in the context of a perceptron, it is interesting to tryto quantify its complexity. This raises the general question how do we quantifythe complexity of a given archtecture, or its capacity to realize a set of inputoutput functions, in our case-dichotomies. A straightforward simple answerwould be to count the number of diﬀerent functions that this architecture canimplement. However, this measure is meaningless if the inputs come from somecontinuous space, as in this case, every arbitrarily small changes in the decisionboundary will correspond to a new dichotomy.One way to overcome this diﬃculty is to constrain the input to a fixed finiteset of P input vectors, {x1 , ., xP }. We can now attempt to count the number of functions (dichotomies) that can be implemented by a given networkarchitecture on this fixed set of inputs.2

We will now apply this idea in the case of the perceptron. Before doing so,there is another potential complication which is that in general, the numberof dichotomies that can be realized by a given architecture may depend alsoon the particular set of inputs. Fortunately enough this is not the case in thePerceptron. In fact, the number of linearly realizable dichotomies on a set ofpoints depend only on a mild condition, known as general position.General position is the extension of the concept of linear dependence. We candemand that the input set is linearly independent since P may be larger thanN. General position is the condition that {x1 , ., xP } has no subset of size lessthanN that is linearly dependent. General position is a necessary condition forlinear separably. It is clear that if for instance, two oppositely labeled points lieon a line (emanating from the origin) they cannot be a plane intersecting theorigin that separates them. On the other hand, this is a very mild conditionthat is obeyed by any examples generated by P (x) which varies smoothly inRN , and in high dimension, it is obeyed with probability close to 1 even whenthe inputs are discrete valued. The calculation of the number of linearly realizeddichotomies is given by the following theorem.Cover’s Function Counting Theorem (Cover 1966):Theorem: Let x1 , .xP be vectors in RN , that are in general position. Thenthe number of distinct dichotomies applied to these points that can be realizedby a plane through the origin is:C(P, N ) 2NX1 Pkk 0Reminder:for m n ,nm n!m!(n m)! . 1(2)For m n, we definenm 0.Before we prove the theorem let us discuss some of its consequences.Some consequences of Cover’s theorem:1. For P N the above sum is limited by PC(P, N ) 2PX1 Pk 0k 11, so it can be rewritten as 2(1 1)P1 2P(3)(using the familiar binomial expansion). In other words, all possible dichotomiescan be realized.2. for P 2N,C(2N, N ) 2NX1 k 02N 1k3 1 2 22N21 2P1(4)

This is so because the expression for C contains exactly one half of the fullbinomial expansion of (1 1)2N 1 , and we know that this expansion is symmetrical (for example, the first few (odd) binomial expansions are (1, 1), (1, 3, 3, 1),and (1, 5, 10, 10, 5, 1). We conclude that in this case exactly half of all possibledichotomies are able to be realized.3. For PN , it is easy to see that C(P, N ) AP N for some A 0.When P is large compared to N the number of dichotomies that can be implemented still grows with P , but the number of dichotomies that can be implemented in proportion to the number of possible dichotomies shrinks, since thetotal number of possible dichotomies is 2P . Figure 4 shows this phenomena, inlogarithmic scale: for small P , the number of dichotomies is maximal, i.e. 2P ,and thus linear in the logarithmic scale. When P becomes large, the numberof possible dichotomies becomes only polynomial, and hence logarithmic in thelogarithmic scale.F igure 3 : F raction of linearly separable dichotomies vs. P/N f or N 5 and 654. Another way to look at the behavior of C(P, N ) is to allow both P andPN to approach 1, but to keep them proportional, i.e. N . In this case,Pas N ! 1, C(P, N ) vs. N becomes a step function, as shown in figure 5.Note that the shape of the step function must of course still obey the properties)that we found before, in particular C(2N,N 0.5. This is in fact the value2NParound which the step occurs, i.e. below the critical value of N 2 we see thatvirtually all of the dichotomies are possible, and above that value virtually noneare possible.4

2.1Proof of Cover’s Theorem:Start with P points in general position. We now assume that there are C(P, N )dichotomies possible on them, and ask how many dichotomies are possible ifanother point (in general position) is added, i.e. what is the value of C(P 1, N ).In this way we will set up a recursive expression for C(P, N ).Let (b1 , ., bP ) be a dichotomy realizable by a hyperplane over the set of Pinputs – in other words, bi 2 { 1, 1} for every i 1.P , and there is a set ofweights w so that for each of them sign(wT x1 ), ., sign(wT xP ) (b1 , ., bP ).Using one such w, we get a dichotomy over the set of P 1 inputs:sign(wT x1 ), ., sign(wT xP 1 ) (b1 , ., bP , sign(wT xP 1 ))(5)Thus for every linearly realized dichotomy over P points there is at least onelinearly realized dichotomy over P 1 points. Note that diﬀerent dichotomiesover P points define diﬀerent dichotomies over P 1 points, since they diﬀersomewhere in the first P co-ordinates.Now, potentially, the additional dichotomy (b1 , ., bP , sign(wT xP 1 )) (reversing the sign of the last co-ordinate) is also possible, by some other set of weights?In this way C(P 1, N ) can be higher than C(P, N ). Let us write(6)C(P 1, N ) C(P, N ) DOur goal is to find D, the number of additional dichotomies.Let us assume that one of the weight vectors W that generates (b1 , ., bP ) passesdirectly through xP 1 , as shown in the figure. In this case, it is clear thatby slight changes in the angle of the hyperplane we will be able to move thehyperplane slightly to this side or the other of xP 1 - thus getting a value ofeither 1 or 1 on it. Thus in this case, both (b1 , ., bP , 1) and (b1 , ., bP , 1)are possible, and there is one additional possible dichotomy beyond C(P, N )(that is counted in D). On the other hand, if no hyperplane that passes throughxP 1 (and generates (b1 , ., bP ) on the first P vectors) exists, then it means thatthe point lies in one side of all the planes that generate the old dichotomy, hencewe will not be able to achieve both dichotomies, unlike before. We have thusseen that D is the number of those dichotomies over P points that are realized bya hyperplane that passes through a certain fixed point xP 1 (which is in generalposition with the other points). [Show Figure]. Now, by forcing the hyperplaneto pass through a certain fixed point, we are in fact moving the problem to onein N 1 dimensions, instead of N . This can be understood if the point is onthe x axis, for example - then the hyperplane has N 1 axes left to ”work with”(if the point is not on the x axis, then rotate the axes of the space around toget the point on the x axis, and this of course has no eﬀect on the geometry ofthe problem). In conclusion, D C(P, N 1), and the recursion formula isC(P 1, N ) C(P, N ) C(P, N51)(7)

We now prove the theorem by induction. Let us assume thatC(P, N ) 2NX1 Pk 0k 1holds up to P and N [Note that it trivially holds for P 1 and all N , sinceit gives C(1, N ) 2 as expected, since one point in N dimensions can bedichotomized with the two labels by a hyperplane]. Then,C(P 1, N ) 2NX1 Pkk 0 2NX1 Pk 0where we have usedfor k 0 ] .3nk 1k n 1k 1 2NX1 k 0 1n 1k 2NX2 Pk 01k NX1 P 1 21kPk(8)k 0[and used the convention thatnk 0The Perceptron Learning AlgorithmGiven a set of input vectors{x1 , ., xP }, and a set of desired labels {y01 , ., y0P },we want to find an algorithm that, starting from some initial weight vector, itwill modify it on the basis of the examples ultimately yielding set of weights wthat classify correctly all the examples,sign(wT xµ ) y0µ , 8µ(9)The famous Perceptron Learning Algorithm that is described achieves this goal.The PLA is incremental. Examples are presented one by one at each timestep, and a weight update rule is applied. Once all examples are presented thealgorithms cycles again through all examples, until convergence.The PLA begins with an arbitrary set of weights w0 . At each subsequent step,call it the nth step, we have the weights arrived at in the previous step, wn 1 ,and a new example (xn , y0n ). The weights are then updated according to thefollowing rules: Ify0n (wn 1T xn ) 0(10)implying that the current weight vector classifies correctly the new example,thenwn wn61(11)

and the algorithm moves to the next step. If on the other hand,y0n (wn1Txn ) 0(12) y0n xn(13)implying that error has occurred, then,wn wn1These rules can be summrized aswn wn1 y0n xn ( y0n wn1Txn )(14)where (x) is a step function, returning 0 for x 0 and 1 for x 0.The algorithm halts when all the examples are classified correctly by w . Ifthere was no error, then the algorithm halts, and all the training examplesare classified correctly. Whether the algorithm finds w that realizes the taskdepends first of all on whether the data is linearly separable. Assuming that itis, then the theorem discussed below, ensures that the PLA will find a weightvector that correctly classifies the data (although the solution may not coincidewith w as the solution will not be unique).Perceptron Convergence Theorem:For any finite set of linearly separable labeled examples, the Perceptron LearningAlgorithm will halt after a finite number of iterations.In other words, after a finite number of iterations, the algorithm yields a vectorw that classifies perfectly all the examples.Note: Although the number of iterations is finite, it is usually larger than thesize of the training set, because each example needs to be processed more thanonce.Proof:We note by W a ’teacher’ weight vector that correctly separates the trainingexamples, i.e., it obeysy0µ w T xµ 0, 8mu(15)Let wn be the ’student’s’ weight vector at the n-th time step. It forms an angle (n) with w , which we will denote A(n):c(n) cos (n) 7wnT w kwn k kw k(16)

The basic idea behind the proof is that if the algorithm were to go on foreverthe RHS of 16 is become greater than 1 for large values of n. Think aboutthe dynamics of w as mimicking some sort of random walk, where some of thesteps move it in the w* directions, some away from it. At the same time, itsnorm is also changing. If it was indeed a truly punbiased random walk then thenorm (denominator) would grow typically as n. The numerator will havepafluctuating sign but by central limit theorem, its typical values are around n.However, as we will show, wn makes a baised random walk during learning, inthe direction of direction of w , hence the numerator grows linearly with n.On the other hand, because the change in wn is biased away from the currentweightvector wn 1 ,the term kwn k in the denominator grows at most only aspn. Since the RHS of 16 cannot be larger than 1, n cannot be large.We now proceed to the formal proof.Margins: An important quantity isµ y0µ w T xµkw k(17)µ(18)is the (signed) Euclidean distance of the point xµ from the plane w . Inmodern learning theory it is called the margin of this example relative to theseparating plane, w . Note that µ is is strictly positive since all points arecorrectly classified, hence we can defineµ minµ 0which is the minimal margin relative to the separation hyperplane, w . [Notethat in the definition of the margin we could also divide by the norm of xµ sinceonly the angle between xµ and w matters].For simplicity we will count a learning step only those in which the weights areupdated. At the nth update step we havewnwn1wn y0n xn If there is an actual update at this step then there was an error, i.e.0, and this has the consequence thatwn1Twn 0(19)y0n wn 1T xn (20)In words, if a correction was made then the projection of wn 1 on the correctedvector will be negative. The two key inequalities are Eqs. 18 and 20.We will now investigate the numerator of c(n):wnT w wn1T8w wnT w (21)

wn1Tw y0n w T xn wnwnT w sinceget µwn1Tw 1Tw µkw k(22)kw k(23)kw k(24). We can then apply this inequality n times, starting from w0 , townT w w0T w n Hence for large values of n we obtain,W nT W nkW k(25)Examining the denominator (just the kwn k component, because kw kdoesn’tchange), we obtain,2kwn k wnww1 wnn 1 22n 1 22(26)n 2 kx k 2(wn 1Tnw ) 2 D2(27)(28)where D maxµ xµ . Applying the inequality n times, we get2Thus for large n,kwn k w0kwn k 2p n 2 D2 .(29)n D(30)Putting it all together: for large enough n,c(n) p n kw kp n Dn D kw k(31)Thus if n ! 1 then A(n) will become larger than one, and we arrive at acontradiction since c(n) cos( ). thus proving that after a finite number ofcorrections the algorithm will halt.3.1Convergence time of the PLAFrom the convergence proof we can also derive a bound on the convergence ofthe PLA. Let us definemax arg max w9 (32)

Then,n D22max(33)which highlights the important role of the margin (or rather than maximummargin ) in the performance of the perceptron.Indeed, the convergence time of the PLA can be in worst cases can be extremelylong since there might be two examples with opposite lables but very close toeach othe which will make max very small. However, the typical behavior canbe evaluated from the following scenario. Imagine the inputs are generated froma gaussian distribution where each component is hxµi i 0 and variance 1/N . sothat the norm of the inputs is 1 on average. Then, the probaility of a randompexample to have a small (relative to ) margin µ is roughly µ / µ N .Thus,p the probability that there will be one example with margin is P rob( ) P N . So the minimal margin is given by the condition P rob( ) 1, fromwhich we obtain 1pP N, n P 2N10(34)

P 1 k holds up to P and N [Note that it trivially holds for P 1and all N,since it gives C(1,N) 2as expected, since one point in N dimensions can be dichotomized with the two labels by a hyperplane]. Then, C(P 1,N) 2 NX1 k 0 P 1 k 2 NX2 k 0 P 1 k 2 NX1 k 0 P 1 k 2 NX1 k 0 P 1 k 1 2 NX1 k 0 P

Related Documents:

Nonprofit Self-Assessment Checklist

May 02, 2018 · D. Program Evaluation ͟The organization has provided a description of the framework for how each program will be evaluated. The framework should include all the elements below: ͟The evaluation methods are cost-effective for the organization ͟Quantitative and qualitative data is being collected (at Basics tier, data collection must have begun)

1.4K Views

2y ago

Name of thé élément in thé language and script of thé ... - UNESCO

Silat is a combative art of self-defense and survival rooted from Matay archipelago. It was traced at thé early of Langkasuka Kingdom (2nd century CE) till thé reign of Melaka (Malaysia) Sultanate era (13th century). Silat has now evolved to become part of social culture and tradition with thé appearance of a fine physical and spiritual .

116 Views

9m ago

[Kl - Mauritius

On an exceptional basis, Member States may request UNESCO to provide thé candidates with access to thé platform so they can complète thé form by themselves. Thèse requests must be addressed to esd rize unesco. or by 15 A ril 2021 UNESCO will provide thé nomineewith accessto thé platform via their émail address.

469 Views

1y ago

Employee Benefits Event - Schneider Downs Tax Services

̶The leading indicator of employee engagement is based on the quality of the relationship between employee and supervisor Empower your managers! ̶Help them understand the impact on the organization ̶Share important changes, plan options, tasks, and deadlines ̶Provide key messages and talking points ̶Prepare them to answer employee questions

328 Views

1y ago

Study Investigating thè Effect of E- Service Quality on Customer's ...

Dr. Sunita Bharatwal** Dr. Pawan Garga*** Abstract Customer satisfaction is derived from thè functionalities and values, a product or Service can provide. The current study aims to segregate thè dimensions of ordine Service quality and gather insights on its impact on web shopping. The trends of purchases have

125 Views

9m ago

Kinh Giải Thâm Mật HT. Thích Trí Quang dịch giải

Chính Văn.- Còn đức Thế tôn thì tuệ giác cực kỳ trong sạch 8: hiện hành bất nhị 9, đạt đến vô tướng 10, đứng vào chỗ đứng của các đức Thế tôn 11, thể hiện tính bình đẳng của các Ngài, đến chỗ không còn chướng ngại 12, giáo pháp không thể khuynh đảo, tâm thức không bị cản trở, cái được

1.6K Views

3y ago

4 Perceptron Learning

R. Rojas: Neural Networks, Springer-Verlag, Berlin, 1996 80 4 Perceptron Learning If a perceptron with threshold

11 Views

3y ago

Deep Learning with INT8 Optimization on Xilinx Devices ...

deep learning, also known as deep neural networ ks (DNN), are still perceptron-like operations, but in wider ensemble and deeper stacked perceptron structures. Figure 4 also shows the basic operation of a perceptron, through multipl

34 Views

2y ago

Recent Views

WHAT LAW IS ? An Introduction to Law

common law system civil law system!! sources of law in civil law !! a1. primary: statutes (written law) enacted by legislative power are the principal source of law. ! a2. two subsidiary sources of law: ! a2.1 administrative regulations a.2.2 customs!! ! sources of law in common law !!! b1. two primary sources of

2y ago

385 Views

12 PUBLIC LAW AND PRIVATE LAW - Home: The National .

INTRODUCTION TO LAW MODULE - 3 Public Law and Private Law Classification of Law 164 Notes z define Criminal Law; z list the differences between Public and Private Law; and z discuss the role of Judges in shaping Law 12.1 MEANING AND NATURE OF PUBLIC LAW Public Law is that part of law, which governs relationship between the State

3y ago

745 Views

Dr. Ram Manohar Lohiya National Law University, Lucknow

2. Health and Medicine Law 3. Int. Commercial Arbitration 4. Law and Agriculture IXth SEMESTER 1. Consumer Protection Law 2. Law, Science and Technology 3. Women and Law 4. Land Law (UP) Xth SEMESTER 1. Real Estate Law 2. Law and Economics 3. Sports Law 4. Law and Education **Seminar Courses Xth SEMESTER (i) Law and Morality (ii) Legislative .

3y ago

496 Views

Companies Law - Cayman Islands dollar

Law 1 of 1971-15th December, 1970 Law 7 of 2000- 20th July, 2000 Law 7 of 1973-28th June, 1973 Law 5 of 2001-20th April, 2001 Law 24 of 1974-22nd November, 1974 Law 10 of 2001-25th May, 2001 Law 25 of 1975-9th December, 1975 Law 29 of 2001-26th September, 2001 Law 19 of 1977-10th November, 1977 Law 46 of 2001-14th January, 2002

3y ago

454 Views

It’s the Law!

ciples stated in Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law. Students will be able to explain the application of Boyle’s Law, Charles’ Law, Gay-Lussac’s Law, Henry’s Law, and Dalton’s Law to observations or events related to SCUBA diving. MateriaLs None audio/visuaL MateriaLs None teachinG tiMe

2y ago

378 Views

Common-Law Courts in a Civil-Law System: The Role of United Stat-es .

He learns the law, not by reading statutes that promulgate it or treatises that summarize it, but rather by studying the judicial opinions that invented it. This is the famous case-law method, 1 Oliver Wendell Holmes, Jr., The Common Law (1881). · : .·· ' COMMON-LAW COURTS IN A CIVIL-LAW SYSTEM pioneered by Harvard Law School in the last .

1y ago

197 Views

Faculty of Juridical, Social and Political Sciences Year .

Law L Law IV 8 Drept procesual civil II / Civil Procedure Law II 5 Law L Law IV 8 Dreptul comerțului internațional / International ommercial Law 4 Law L Law IV 8 riminalistică / Forensics 4 Law L Law IV 8 Practică de cercetare pentru elaborarea lucrării de lincență(3 săptămân

2y ago

384 Views

Ohm ’s Law

Ohm ’s Law Ohm's law states that, in an electrical circuit, the current passing through most materials is directly proportional to the potential difference applied across them. 3-1—3-3: Ohm ’s Law Formulas There are three forms of Ohm’s Law: I V/R V IR R V/I where:File Size: 1MBPage Count: 40Explore furtherOhm's Law Quiz MCQs with Answers Ohm Lawohmlaw.comOhm’s Law Worksheet - Basic Electricity - All About omohms law worksheet - eering.orgOhm’s Law Worksheet - Richmond County School Systemwww.rcboe.orgOhm's Law with Examples - Physics Problems with Solutions ended to you b

2y ago

295 Views

Intermediate Law Law and You Worksheet 3: Australian law - Home Affairs

4. There are different kinds of law to deal with different kinds of problems. Four important kinds of law are civil law, criminal law, family law and administrative law. Civil law deals with disputes between individuals; for example, if someone sells you goods that are faulty, or that cause you injury or damage, you can take that person to court.

4m ago

110 Views

PRINCIPLES OF BUSINESS LAW - DPHU

ABE Diploma in Business Administration Study Manual PRINCIPLES OF BUSINESS LAW Contents Study Unit Title Page Syllabus i 1 Nature and Sources of Law 1 Nature of Law 3 Historical Origins 6 Sources of Law 9 The European Community and UK Law: An Overview 13 2 Common Law, Equity and Statute Law 23 Custom 25 Case Law 26 Nature of Equity 32

3y ago

285 Views

Principles of Common Law Public Law – Part 1 The British .

Institute of Law-The UK Constitution-Separation of Powers-Rule of Law-Sovereignty of Parliament-Royal Prerogative-Judicial Review-Human RightsIntroduction to UK Public Law-Court system-The Trial-Common law –judge made law-Doctrine of precedent-Challenges of judge made law-Statutory

2y ago

130 Views

A Trail Guide to Careers in Environmental Law

law, constitutional law, property law, bankruptcy law, criminal law, food and drug law, land use planning law, and international law. A distinctive aspect of environmental practice is the role of science in advocacy efforts.

3y ago

241 Views

Accounting Technicians Diploma (ATD) Examination Syllabus

Apply law of contract and tort in various scenarios Apply general principles of business law in practice. CONTENT 2.1 Elements of the legal system 2.1.1 Nature, purpose and classification of law - Meaning of law - Nature of law - Purpose of law - Classification of law - Law and morality 2.1.2 Sources of law - The Constitution

3y ago

216 Views

MsEffie’s List of Poetry Essay Prompts for Advanced .

15 Law is as I’ve told you before, Law is as you know I suppose, Law is but let me explain it once more, Law is The Law. Yet law-abiding scholars write: 20 Law is neither wrong nor right, Law is only crimes Punished by places and by times, Law is the clothes men wear

2y ago

181 Views

An Introduction to Kansas Law Impacting the Oil and Gas Industry .

property law and contract law to the oil and gas subject matter, or it is an adaptation of property law or contract law to create a unique rule that we label "oil and gas" law. a. "Adaptation" will in many cases be, at most, a charitable way of describing what courts do to property law and contract law to develop a new rule of "oil and gas" law. b.

1y ago

143 Views

Introduction: The Perceptron - MIT

It looks like you're using an ad-blocker