IEEE TRANSACTlIONS ON LNFORMATION THEORY, VOL. NO. 1, JANUARY Axiomatic .

1y ago
4 Views
2 Downloads
1.94 MB
12 Pages
Last View : 9d ago
Last Download : 3m ago
Upload by : Ronnie Bonney
Transcription

26IEEE TRANSACTlIONSON lNFORMATIONTHEORY,VOL.m26,NO.1, JANUARY1980Axiomatic Derivation of the Principle ofMaximum Entropy and the Principleof M inim u m Cross-EntropyJOHN E. SHORE, MEMBER, IEEE, AND RODNEYw. JOHNSONThe principle of maximum entropy states that, ofall the distributions q that satisfy the constraints, youshould choose the one with the largest LstiticatioaslLve- Z ,q(x,)log(q(x,)). Entropy maximization was first prohtuiUvearguma@andrelyonthepro ofentropyanderossenhppythat reasoMble posed as a general inference procedure by Jaynes tiveinfere shouidleedto nsistentresuitswhentberealthough it has historical roots in physics (e.g., Elasserare different ways of taking the SIUM! information into acumnt (for [67]). It has been applied successfully in a remarkableeh different cJxKdwesystems).‘lhlp isf variety of fields, including statistical mechanics and therasfourc?xn&tencDxlomaThesearestptedhtenmofon modynamics [ l]-[8], statistics [9]-[ 11, ch. 61, reliabilityinfonMt&noperPtMaIldmakenoref ncetoinf Itbestimation [ 11, ch. lo], [ 121, traffic networks [ 131, queuing v tbrttheprincipleofmaximumentropyheorrectinfbef seme: maxhizhnanylunctioobutentropywiUleodto theory and computer system modeling [ 141, [ 151, systemunless that funetkm and entropy hnve identkal maxima. In other vvurds, simulation [ 161, production line decisionmaking [ 171, [ dvahres,tbereiscomputer memory reference patterns [19], system osenbyalarity[20], group behavior [21], stock market analysis [22], axioms;tbiSunlquedisMbUtlOIlpJ-odmtbat-theproblem solving [ll],[17],entropy. This result is established both and general probabilisticcmnbeobtahedbymaximh@[23]-[25]. There is much current interest in maximumdlrectly ossentropy. Results are obtahed both for entropy spectral analysis [26]-[29].tbeprindpkofndnimumeoldmow probabuity dell&lea and for d&Crete distributioas.The principle of minimum cross-entropy is a generalization that applies in cases when a prior distributionpthatI. INTRODUCTIONestimates qt is known in addition to the constraints. Theprinciple states that, of the distributions q that satisfy theE PROVE THAT Jaynes’s principle of maximumconstraints, you should choose the one with the leastentropy and Kullback’sprinciple of minimumcross-entropy py (minimum directed divergence) are correctentropy is equivalent to maximizing entropy when themethods of inference when given new informationinprior is a uniform distribution. Unlike entropy maximizaterms of expected values. Our approach does not rely ontion, cross-entropy minimization generalizes correctly forintuitive arguments or on the properties of entropy andcontinuous probability densities. One then minimizes thecross-entropy as information measures. Rather, we confunctionalsider the consequences of requiring that methods of inference be self-consistent.H(w) q(x)los(P(x)/p(x)).0)Absrmez-Jaynes’s principle of m@mum entropy and KuUba&‘s prio-dple of minimum cromentropy (mhlmum dire&d dfvergenoe) are showntobeunfquelycomxtmethodsforhductiveinf whennewinformn-WA. The Maximum Entropy Principle and the M inimumCross-EntropyPrincipleSuppose you know that a system has a set of possiblestates xi with unknown probabilities qf(xi), and you thenlearn constraints on the distribution qt: either values ofcertain expectations ZZiqt(xi)fk(xi) or bounds on thesevalues. Suppose you need to choose a distribution q that isin some sense the best estimate of qt given what youknow. Usually there remains an infinite set of distributions that are not ruled out by the constraints. Which oneshould you choose?Manuscript received October 23, 1978; revised March 5, 1979.The authors are with the Naval Research Laboratory, Washington,DC 20375.U.S. GovernmentThe name cross-entropy is due to Good [9]. Other namesinclude expected weight of evidence [30, p. 721, directeddivergence [31, p. 71, and relative entropy [32]. First proposed by Kullback [31, p. 371, the principle of minimumcross-entropy has been advocated in various forms byothers [9], [33], [34], including Jaynes [3], [25], who obtained (1) with an “invariant measure” playing the role ofthe prior density. Cross-entropy minimization has beenapplied primarily to statistics [9], [31], [35], [36], but alsoto statistical mechanics [8], chemistry [37], pattern recognition 1381, [39], computer storage of probability distributions [40], and spectral analysis [41]. For a general discussion and examples of minimizing cross-entropy subject toconstraints, see [42, appendix B]. APL computer programswork not protected by U.S. copyright.

OPYPRINCIPLESfor finding minimum cross-entropy distributions givenarbitrary priors and constraints are described in [43]. Bothentropy maximizationand cross-entropy minimizationhave roots in Shannon’s work [44].27reasonable to require that different ways of using it totake the same information into account should lead toconsistent results. W e formalize this requirement in fourconsistency axioms. These are stated in terms of an abstract information operator; they make no reference toB. Justifying the Principles as GeneralMethod of Inferenceinformation measures.W e then prove that the maximum entropy principle isDespite its success, the maximum entropy principlecorrectin the following sense: maximizing any functionremains controversial [32], [45]-[49]. The controversy apbutentropywill lead to inconsistencies unless that funcpears to stem from weaknesses in the foundations of thetionandentropyhave identical maxima (any monotonicprinciple, which is usually justified on the basis of enfunctionofentropywill work, for example). Stated diftropy’s unique properties as an uncertainty measure. Thatferently,weprovethat,given new constraint information,entropy has such properties is undisputed; one can prove,thereisonlyonedistributionsatisfying these constraintsup to a constant factor, that entropy is the only functionthatcanbechosenbyaprocedurethat satisfies thesatisfying axioms that are accepted as requirements for anconsistencyaxioms;thisuniquedistributioncan be obuncertainty measure [44, pp. 379-4231, [50], and [51].tainedbymaximizingentropy.Weestablishthis resultIntuitively, the maximum entropy principle follows ultnaturally from such axiomatic characterizations. estates that the maximum entropy distribution “is uniquelythat, given a continuous prior density and new condetermined as the one which is maximally noncommittalstraints, there is only one posterior density satisfying thesewith regard to missing information” [ 1, p. 6231, and that itconstraintsthat can be chosen by a procedure that satis“agrees with what is known, but expresses ‘maximumfiestheaxioms;this unique posterior can be obtained byuncertainty’ with respect to all other matters, and thusminimizingcross-entropy.leaves a maximum possible freedom for our final deciInformally,ouraxiomsmaybephrased as follows.sions to be influenced by the subsequent sample data” [25,p. 2311. Somewhat whimsically, Benes justified his use ofUniqueness:The result should be unique.I.entropy maximization as “a reasonable and systematicII.Inuariance: The choice of coordinate systemway of throwing up our hands” [13, p. 2341. Others argueshould not matter.similarly [5]-[9], [ll]. Jaynes has further supported enIII.System Independence:It should not mattertropy ma ximization by showing that the maximum entwhether one accounts for independent informaropy distribution is equal to the frequency distributiontion about independent systems separately inthat can be realized in the greatest number of ways [25],terms of different densities or together in terms ofan approach that has been studied in more detail bya joint density.North [52].IV.Subset Independence:It should not matterSimilar justifications can be advanced for cross-entropywhether one treats an independent subset of sysminimization. Cross-entropy has properties that are desirtem states in terms of a separate conditionalable for an information measure [33], [34], [53], and onedensity or in terms of the full system density.can argue [54] that it measures the amount of informationnecessary to change a prior p into the posterior q. Cross- These axioms are all based on one fundamental principle:entropy can be characterized axiomatically, both in the if a problem can be solved in more than one way, thediscrete case [8], [54]-[56] and in the continuous case [34]. results should be consistent.The principle of cross-entropy minimization then followsOur approach is analogous to work of Cox [59], [60],intuitively much like entropy maximization. In an interest[ll, ch. I] and similar work of Janossy [61], [62]. From aing recent paper [58] Van Camper&out and Cover have requirement that probability theory provide a consistentshown that the minimum cross-entropy density is the model of inductive inference, they derive functional equalimiting form of the conditional density given average tions whose solutions include the standard equations ofvalues.probability theory. Emphasizing invariance, Jeffreys [63]To some, entropy’s unique properties make it obvioustakes the same premise in studying the choice of priors.that entropy maximization is the correct way to accountfor constraint information. To others, such an informalC. Outlineand intuitive justification yields plausibility but not proofThe remainder of the paper is organized as follows. In-why maximize entropy; why not some other function?Section II we introduce some definitions and notation. InSuch questions are not answered unequivocally by previous justifications because they argue indirectly. Most are Section III we motivate and formally state the axioms.based on a formal description of what is required of an Their consequences for continuous densities are exploredin Section IV; a series of theorems culminates in our maininformation measure; none are based on a formal descripresult justifying the principle of minimum cross-entropy.tion of what is required of a method for taking information into account. Since the maximum entropy principle is The discrete case, including the principle of maximumentropy, is discussed in Section V. Section VI contrastsasserted as a general method of inductive inference, it is

28IEEE TRANSACITONSaxioms of inference methods with axioms of informationmeasures and contains concluding remarks. A more detailed exposition of our results is contained in [42].II.functionalON INPORMATIONTHEORY,To formalize inference about probability densities thatsatisfy arbitrary expectation constraints, we need a concise notation for such constraints. We also need a notation for the procedure of minimizing some functional tochoose a posterior density. We therefore introduce anabstract information operator that yields a posterior density from a prior density and new constraint information.We can then state inference axioms in terms of thisoperator.We use lowercase boldface roman letters for systemstates, which may be multidimensional,and uppercaseboldface roman letters for sets of system states. We uselowercase roman letters for probability densities and uppercase script letters for sets of probabilitydensities.Thus, let x be a state of some system that has a set D ofpossible states. Let 9 be the set of all probability densities q on D such that q(x) 0 for x ED andIT-26, NO.1, JANUARY1980H(q,p) in the constraint set :H(cLP) y We introduce an “information(6) using the notationDEFINITIONS AND NOTATIONVOL.H(q’7p).(6)operator”0 that expressesq po I.(7)prior and newThe operator 0 takes two arguments-ainformation-andyields a posterior. For some other functional F(q,p), suppose q satisfies (6) if and only if itsatisfiesThen we say that F and H are equivalent. If F and H areequivalent, the operator 0 can be realized using eitherfunctional.If H has the form (l), then (7) expresses the principle ofminimumcross-entropy. At this point, however, weassume only that H is some well-behaved functional. InSection III we give consistency axioms for 0 that restrictthe possible forms of H. We say that a functional Hsatisfies one of these axioms if the axiom is satisfied by theoperator 0 that is realized using H.In making the restriction (5) we assume that D is the setdxq(x) 1.ofstates that are possible according to prior information.(2)JDWe do not impose a similar restriction on the posteriorWe use a superscript dagger to distinguish the system’s q p 0 Z since Z may rule out states currently thought to beunknown “true” state probability density qt E 9. When possible. If this happens, then D must be redefined beforeS c D is some set of states, we write q(x ES) for the set of q is used as a prior in a further application of 0. Thevalues q(x) with x ES.restriction (5) does not significantly restrict our results,New information takes the form of linear equality con- but it does help in avoiding certain technical problemsstraintsthat would otherwise result from division by p(x). Forsimilar reasons-avoidanceof technically troublesomedxqt(n)ak(x) 0(3)singularcases-weimposeonthe information I the re/Dstriction that there exists at least one density qE l withand inequality constraintsH(w) Ddx.q (x)c,(x) 0(4for known sets of bounded functions ak and c,. Theprobability densities that satisfy such constraints alwayscomprise a closed convex subset of 9. (A set !I c % isconuex if, given 0 A 1 and q,r E 5, it contains theweighted average Aq (1 - A)r.) Furthermore, any closedconvex subset of 6iJ can be defined by equality andinequality constraints, perhaps infinite in number. Weexpress constraints in these terms, using the notationI (qt E ), to mean that qt is a member of the closedconvex set 9 C 9. We refer to Z as a constraint and to 9 asa constraint set. We use uppercase roman letters for constraints.Let p E 9 be some prior density that is an estimate of q obtained, by any means, prior to learning I. We requirethat priors be strictly positive:p(x D) 0.(5)(This restriction is discussed below.) Given a prior p andnew information I, the posterior density q E that resultsfrom taking I into account is chosen by minimizing a 00.For some subset S c D of states and x ES, letbe the conditional density, given XES,any qE9. We usecorrespondingtoq(xlxES) q*S(9)as a shorthand notation for (8).When D is a discrete set of system states, densities arereplaced by discrete distributions and integrals by sums inthe usual way. We use lowercase boldface roman lettersfor discrete probability distributions, which we consider tobe vectors; for example, q ql, - - . ,q”. It wiIl always beclear in context whether, for example, the symbol r refersto a system state or a discrete distribution and whether sirefers to a probability density or a component of a discrete distribution.III.THEAXIOMSWe follow the formal statement of each axiom with ajustification. We assume, throughout, a system with possible states D and probability density qt E 9.

OPY29PRINCIPLESAxiom I (Uniqueness):The posterior q p 0 I is uniquefor any prior p E 0 and new information I (qt E !l),where J 9.Justification: If we solve the same problem twice inexactly the same way, we expect the same answer to resultboth times. Actually, Axiom I is implicit in our notation.Axiom II (Invariance): Let I’ be a coordinate transformation from x E D to YE D’ with @ ‘q)(y) J-‘q(x),where J is the Jacobian J a(y)/a(x).Let ITI be the setof densities rq corresponding to densities q E 9. Let(I’J)c(I’ )correspond to Cl 9. Then, for any priorp E 9 and new information I (qt E S),that satisfyJ s,dxq( x) m ,for each subset Si, where the m i are known’values.(13)Then(p”(I/\M))*S, (p*S,)oh(14)holds, where I I, A I,A . . . A I,,.Justification: This axiom concerns situations in whichthe set of states D decomposes naturally into disjointsubsets Si, and new information Ii is obtained about theconditional probability densities qt*Si in each subset (see(8) and (9)). One way of accounting for this information is(0) 0 0-z) r(p 0 I)(10) to obtain a conditional posterior qi (p*Si) 01 from eachholds, where rl ((I’ ) E (I?l )).conditional prior p*S,. Another way is to obtain a postJustification: W e expect the same answer when we erior q p 0 I for the whole system, where I I,// - * * A I,.solve the same problem in two different coordinate sys- The two results should be related by q*Si qi ortems, in that the posteriors in the two systems should be(p”z)*si (p*si) zi.(15)related by the coordinate transformation.Suppose there are two systems, with sets D,, D, of states Moreover, suppose that we also learn the probability ofand probability densities of states qf E 9,, q;E Ci . Then being in each of the n subsets. That is, we learn M (qt E%), where 31t is the set of densities q that satisfy (13) forwe require the following axiom.Axiom III (System Independence):Let p, E (X7,and pz E each subset S;. The known numbers m i are the probabiliq1 be prior densities. Let I, (qf g,) and I, (qj E !l,) ties that the system is in a state within Si. The m , satisfyZimi 1. Taking M into account should not affect thebe new information about the two systems, where 4, 9,conditional densities that result from taking I intoand !& c Qz. Thenaccount. W e therefore expect a more general version of(11) (15) to hold, namely (14).holds.IV.Justification: Instead of q{ and qj, we could describethe systems using the joint density qt E g12. If the twosystems were independent, then the jointsatisfyq blJJ dbMw.CONSEQUENCESOFTHEAXIOMSA. Summarydensity would(12)Now the new information about each system can also beexpressed completely in terms of the joint density qt. Forexample, I, can be expressed as I, (qt E g;), where s; C‘i7,* is the set of joint densities qE‘% l,2 such that q1E ,,whereI, can be expressed similarly. Now, since the two priorstogether define a joint prior p p,p2, it follows that thereare two ways to take the new information I, and I, intoaccount: we can obtain separate posteriors q1 pl 01, andq2 pz 0 I,, or we can obtain a joint posterior q p 0 (I,r\I,). Because p, and pz are independent, and because I,and I, give no information about any interaction betweenthe two systems, we expect these two ways to be relatedby q q,q2, whether or not (12) holds.Axiom IV (SubsetIndependence):Let S,, - * ,S,, be disjoint sets whose union is D, and let p 9 be any knownprior. For each ‘subset Si, let Ii (qt*Si E gi) be newinformation about the conditional density qt*Si, whereCliC si and Si is the set of densities on Si. Let M (q E‘X) be new information giving the probability of being ineach of the n subsets, where % is the set of densities qSince we require the axioms to hold for both equalityand inequality constraints (2) and (3), they must hold forequality constraints alone. W e first investigate the axioms’consequences assuming only equality constraints. Later,we show that the resulting restricted form for H alsosatisfies the axioms in the case of inequality constraints.W e establish our main result in four steps. The first stepshows that the subset independence axiom and a specialcase of the invariance axiom together restrict H to functionals that are equivalent to the formF(w) Ddrf(qb%dx))(16)for some function f. W e call this the “sum form.” In theaxiomatic characterizations in [34], [55], and [56], the sumform was assumed rather than derived. Our next stepshows that the general case of the invariance axiom restricts H to functionals that are equivalent to the form(17)for some function h. Our third step applies the systemindependence axiom and shows that if H is a functionalthat satisfies all four axioms, then H is equivalent tocross-entropy (1). Since it could still be imagined that nofunctional satisfies the axioms, our final step is to showthat cross-entropy does. W e do this in the general case ofequality and inequality constraints.

IEEE TRANSACTIONS30B. Deriving the Sum FormWe derive the sum form in several steps. First, we showthat when the assumptions of the subset independenceaxiom hold, the posterior values within any subspace areindependent of the values in the other subspaces. Next, wemove formally to the discrete case and show that invariance implies that H is equivalent to a symmetric function.We then apply the subset independence axiom and provethat H is equivalent to functions of the form F(q,p) zif( pJ, where p and q are discrete prior and posteriordistributions, respectively, and we return to the continuous case yielding (16).We begin with the following lemma concerning subsetindependence.Lemma I: Let the assumptions of Axiom IV hold, andlet q p 0 (I/\M)be the posterior for the whole system(q E 9). Then q(xESJ is functionally independent ofq(x@S,), of the priorp(x@Si),and of n.Proof: Let(1’3)be the conditional posterior density in the ith subspace(qi E Si). Since p*Si depends on p only in terms of p(x Esi) (see (8) and (9)), so does qi. Furthermore, since qi is thesolution (18) to a problem in which x E Si only, qi cannotdepend on q(x 5s ). Now, (14) states that q(x) m iqi(x)for XES,, where we have used (8) and (13). Since the m,are fixed, it follows that q(xESi) is independent of q(x@Si) and p(x @ S,), proving Lemma I.ON INFORMATIONTHEORY,VOL.IT-26, NO.1, JANUARY1980prior and the constraints (19) and (20) unchanged. Itfollows from invariance (10) that r also leaves q unchanged, which will only be the case if q is constant ineach Si. In the discrete case, H becomes a functionH(q,p) of 2n variables q,;* * * ,q, and p,; :* ,p,,. To showthat H is equivalent to a symmetric function let r be anypermutation. By invariance, the minima of H and THcoincide, whereMap) H(q,r(,), * - - ,qn(n)a,(l), * *,- ,P, ,,).Therefore the minima of H and F coincide, where F is themean of the ?rH for all permutations r, and H is equivalent to the symmetric function F. This completes theproof of Lemma II.We now prove that H is equivalentthe discrete sum form.to functionswithTheorem I: In the discrete case let H(q,p) satisfyuniqueness, invariance, and subset independence. Then His equivalent to a function of the formF(q,P) IX f( Pj)j(21)for some functionf.Theorem I is proved in the Appendix. The proof restsprimarily on the subset independence property (Lemma1).We return to the continuous case by taking the limit ofa large number of small subspaces Si. The discrete sumform (21) then becomes (16).Our next step is to transform to the discrete case.Lemma ZZ: Let S,,S2, * - * ,S,, be disjoint sets whose union is D. For a prior p and a posterior q p 0 Z letC. Consequenceof GeneralInvariance in theContinuous CasePj ls,dxp(x), and e Js,dxq(x%IJSuppose that p(x E S) is constant for each subset 3, andAlthough invariance was invoked for the special case ofdiscrete permutations in deriving (21), the continuous sumform (16) does not satisfy the invariance axiom for arbitrary continuous transformations and arbitrary functionsf. The invariance axiom restricts the possible forms off asfollows.let the new information Z be provided by constraints (3)and (4) in which the functions ak and ck are also constantin each subset. Then the posterior q p 0 I is also constantin each subset, and H is equivalent to a symmetric function of the n pairs of variables (s,pj) (We refer to thissituation as the discrete case.)Proof: Since the a, and c, are constant in each subset, the constraints have the formx qJakj 0j(19)orTheorem II: Let the functional H(q,p) satisfy uniqueness, invariance, and subset independence. Then H isequivalent to a functional of the formfor some function h.Proof From previous results we may assume H tohave the form (16). Consider new information I consistingof a single equality constraintdxqt(x)a(x) 0.where akj ak(x E Sj), ckj ck(x E Sj), andq,! f-!-,dxq W.INow, let IJ be a measure-preserving transformation thatscrambles the x within each subset Si. This leaves the(23)JDThen, by standard techniques from the calculus of variations, it follows that the posterior q p 0 Z satisfiesh aa(x) g(q(x),p(x)) 0,(24)where X and Q are Lagrangian multipliers correspondingto the constraints (2) and (23) and where the function g is

OPY31PRINCIPLESand wheredefined asdb, c) f@,c).u(r) h(r) r- h(r).(25)Now let r be a coordinate transformation from x to y inthe notation of Axiom II. Then the transformed prior isp’(y) J-‘p(x) and the transformed constraint function isa’(y) Ta a(x). The posterior q’ p’ 0 (r-z)satisfiesThe two systems can also be described in terms of ajoint probability density qt EQ,a joint prior p p,p*,and new information I in the form of the three constraintsSJD I D 2dXdxzq (X ,xJ(26)where A’ and (Y’ are Lagrangian multipliers. Invariance(10) requires that q’(y) J-‘q(x) holds, so (26) becomesh’ a’a(x) g(J-‘q(x),J-‘p(x)) O.(27)Combining (24) and (27) yields(28)Now let S,,. . . ,S, be disjoint subsets whose union is Dand let the prior p be constant within each Sj. It followsfrom Lemma II that q is also constant within each S;.,which in turn results in the right side of (28) beingconstant within each Sj. (The primed Lagrangian multipliers may depend on the transformation r, but they areconstants.) On the left side, however, the Jacobian J(x)may take on arbitrary values since r is an arbitrarytransformation. It follows that g can only depend on theratio of its arguments, i.e., g(b, c) g(b/c). Equation (25),therefore, has the general solution f(a, b) ah(a/ b) v(b),for some functions h and v. Substitution of this solutioninto (16) yields /D d (xN.Since the second term is a function only of the fixed prior,it cannot affect the minimizationof F and may bedropped. This completes the proof of Theorem II.dx,dx2qt(x,,xZ)ai(xi) 0(i 1,2).(33)The posterior q p 0 I satisfiesTheorem III: Let the functional H(q,p) satisfy uniqueness, invariance, subset independence, and system independence. Then H is equivalent to cross-entropy (1).Proof W ith i 1,2, consider two systems with statesxi E Di, unknown densities q,!E qi, prior densities pi E gi,and new information 4 in the form of single equalityconstraints(29)From Theorem II, we may assume that H has the form(22). It follows that the posteriors qi pi 0 4 satisfy& a,a,(x,) u(ri(xi)) O,4v2)- 43) - 43) (a1 - &)a1 (a,-aa;)a, X, X,-X.(35)Consider the case when D, and D, are both the real line.Then, differentiating this equation with respect to x, anddifferentiating the result with respect to x2 yieldsu”(r,r2)r,r, u’(r,r,) O.(36)By suitable choices for the priors and the constraints, r3r2can be made to take on any arbitrary positive value s. Itfollows from (36) that the function u satisfies the differential equation u’(s) su”(s) 0, which has the general solution U(S) A log(s) B, for arbitrary constants A and B.Combining this solution with (31) yieldsh(r) r h(r) A log(r) B,h(r) Alog(r) Our results so far have not depended on Axiom III. W enow show that system independence restricts the functionh in (22) to a single equivalent form.dxiq/(xi)ai(xi) O.(34)where the multipliers A’, a;, and CX;correspond to (32) and(33), and r q/p.Now, system independence (11) requires q qlqz, fromwhich follows r r,r2. Combining (30) and (34) thereforeyieldswhich in turn has the general solutionD. Consequenceof System Independences DiJsDI 4(32) 1,A’ a’,a,(x,) a a,(x,) u(r(x,,x,)) O,g(J-'q(x),J-'p(x)) g(q(x),p(x)) (a-a’)a(x) h-X.J’(w) /DWWM4/P(4)(31)(30)where 4 and ai are Lagrangian multipliers correspondingto the constraints (2) and (29), where ri(xi) qi(xi)/pi(xi),C/r B-A.(37)Substitution of (37) into (22) yieldsF(q,p) Aj-Ddxq(x)log(q(x)/p(x)) (C B-A),(38)sincep integrates to one. Since the constants A, B, and Ccannot affect the minimization of (38), provided A 0,this completes the proof of Theorem III.E. Cross-EntropySatisfies the AxiomsSo far we have shown that if H(q,p) satisfies theaxioms, then H is equivalent to cross-entropy (1). This stillleaves open the possibility that no functional H satisfiesthe axioms for arbitrary constraints. By showing thatcross-entropy satisfies the axioms for arbitrary constraints,we complete the proof of our main result.Theorem IV: Cross-entropy (1) satisfies uniqueness, invariance, system independence, and subset independence.

32IBEE TlUNSACTlONSEvery other functionalalent to cross-entropy.that satisfies the axioms is equiv-Proof: We need only show that cross-entropy satisfies the axioms.Uniqueness:Let 4 be any closed convex set g G 0,and let densities q,r E 4 have the same cross entropyH(q,p) H(r,p) for some prior p E 9. We define g(u) u log(u), with g(0) 0, so that H can be written asH(w) -dxp(x)g(q(x)lp(x)).‘”Now since g”(u) ,,matl/u O,g is strictly convex. It followsq(u) (I- a)g(v) g(au (I- a v),for O a 1 and u#v. We set q(x)/p(x) for u andr(x)/p(x) for v, multiply both sides byp(x), and integrate,obtainingH(w) fvk1, JANUARY1980(40)H(q,q,,zm ) H(q,,P,) H(w’z)and so assumes its minimum when the two terms on theright assume their individual minima-thefirst subject toI,-and the second to 12. Thus we have q (p,pJ 0 (I,r\Z& qlq2 (pl 0 Z1)(p20 I,), and we have proved that crossentropy satisfies Axiom III.Subset Independence:We use the notation inqi q*Si, andAxiom IV. We also define q po(I/\M),pi p*Si. (Equation (14) then becomes qi pi 0 Ii.) Thecross-entropy of q with respect top may be written2 / d m q (x)logi s. xmiH(qi,Pi) 2 mJW(iwhere the si are the prior probabilitiessubset,si /&T)y(41),iof being in eachdxp(x).The second sum on the rig

requirement that probability theory provide a consistent model of inductive inference, they derive functional equa- tions whose solutions include the standard equations probability theory. Emphasizing invariance, Jeffreys [63] takes the same premise in studying choice of priors. C. Outline

Related Documents:

IEEE 3 Park Avenue New York, NY 10016-5997 USA 28 December 2012 IEEE Power and Energy Society IEEE Std 81 -2012 (Revision of IEEE Std 81-1983) Authorized licensed use limited to: Australian National University. Downloaded on July 27,2018 at 14:57:43 UTC from IEEE Xplore. Restrictions apply.File Size: 2MBPage Count: 86Explore furtherIEEE 81-2012 - IEEE Guide for Measuring Earth Resistivity .standards.ieee.org81-2012 - IEEE Guide for Measuring Earth Resistivity .ieeexplore.ieee.orgAn Overview Of The IEEE Standard 81 Fall-Of-Potential .www.agiusa.com(PDF) IEEE Std 80-2000 IEEE Guide for Safety in AC .www.academia.eduTesting and Evaluation of Grounding . - IEEE Web Hostingwww.ewh.ieee.orgRecommended to you b

Standards IEEE 802.1D-2004 for Spanning Tree Protocol IEEE 802.1p for Class of Service IEEE 802.1Q for VLAN Tagging IEEE 802.1s for Multiple Spanning Tree Protocol IEEE 802.1w for Rapid Spanning Tree Protocol IEEE 802.1X for authentication IEEE 802.3 for 10BaseT IEEE 802.3ab for 1000BaseT(X) IEEE 802.3ad for Port Trunk with LACP IEEE 802.3u for .

Signal Processing, IEEE Transactions on IEEE Trans. Signal Process. IEEE Trans. Acoust., Speech, Signal Process.*(1975-1990) IEEE Trans. Audio Electroacoust.* (until 1974) Smart Grid, IEEE Transactions on IEEE Trans. Smart Grid Software Engineering, IEEE Transactions on IEEE Trans. Softw. Eng.

effort to get a much better Verilog standard in IEEE Std 1364-2001. Objective of the IEEE Std 1364-2001 effort The starting point for the IEEE 1364 Working Group for this standard was the feedback received from the IEEE Std 1364-1995 users worldwide. It was clear from the feedback that users wanted improvements in all aspects of the language.File Size: 2MBPage Count: 791Explore furtherIEEE Standard for Verilog Hardware Description Languagestaff.ustc.edu.cn/ songch/download/I IEEE Std 1800 -2012 (Revision of IEEE Std 1800-2009 .www.ece.uah.edu/ gaede/cpe526/20 IEEE Standard for SystemVerilog— Unified Hardware Design .www.fis.agh.edu.pl/ skoczen/hdl/iee Recommended to you b

IEEE 802.1Q—Virtual LANs with port-based VLANs IEEE 802.1X—Port-based authentication VLAN Support IEEE 802.1W—Rapid spanning tree compatibility IEEE 802.3—10BASE-T IEEE 802.3u—100BASE-T IEEE 802.3ab—1000BASE-T IEEE 802.3ac—VLAN tagging IEEE 802.3ad—Link aggregation IEEE

IEEE 1547-2003 IEEE P1032 IEEE 1378-1997 Controls IEEE 2030-2011 IEEE 1676-2010 IEEE C37.1 Communications IEC 61850-6 IEC TR 61850-90-1 & IEEE 1815.1-2015 IEC TR 61850-90-2 Cyber & Physical Security IEEE 1686-2013 IEEE 1402-2000

IEEE 610-1990 IEEE Standard Glossary of Software Engineering Terminology, IEEE, 1990 IEEE 829-2008 IEEE Std 829 IEEE Standard for Software and System Test Documentation, IEEE, 2008 IEEE 1012-2016 IEEE Standard for System, Software, and Hardware

Agile Development and Scrum The agile family of development methods were born out of a belief that an approach more grounded in human reality – and the product development reality of learning, innovation, and change – would yield better results. Agile principles emphasize building working software that