INFORMATION THEORY - Inria

2y ago
58 Views
4 Downloads
1.36 MB
79 Pages
Last View : 18d ago
Last Download : 2m ago
Upload by : Melina Bettis
Transcription

INFORMATION THEORYMaster 1 - Informatique - Univ. Rennes 1 / ENS RennesAline RoumyJanuary 20201/ 74

Outline1Non mathematical introduction2Mathematical introduction: definitions3Typical vectors and the Asymptotic Equipartition Property(AEP)4Lossless Source Coding5Variable length Source coding - Zero error Compression2/ 74

About meAline RoumyResearcher at Inria, RennesExpertise: compression for video streamingimage/signal processing, information theory, machine learningWeb: http://people.rennes.inria.fr/Aline.Roumy/email: aline.roumy@inria.fr3/ 74

Course schedule (tentative)Information theory (IT):a self-sufficient course with a lot of connections to probability Lecture 1: introduction, reminder on probability Lecture 2-3: Data compression (theoretical limits) Lecture 4: Construction of codes that can compress data Lecture 5: Beyond classical information theory (universality.)Course organization: slides (file available online) summary (file available online hardcopy) proofs (see blackboard): take notes!On my roumy teaching.html4/ 74

Course grading and documents Homework:IIexercises (in class and at home)correction in front of the class give bonus points. Middle Exam:II(in group) written exam,home. Final Exam:III(individual) written examquestions de cours, et exercices (in French)2h All documents on my roumy teaching.html5/ 74

Course materialC.E. Shannon, ”A mathematical theory of communication”,Bell Sys. Tech. Journal, 27: 379–423, 623–656, 1948.seminal paper6/ 74

Course materialT.M. Cover and J.A. Thomas. Elements of Information Theory.Wiley Series in Telecommunications. Wiley, New York, 2006.THE reference7/ 74

Course materialR. Yeung. Information Theory and Network Coding.Springer 2008.network coding8/ 74

Course materialA. E. Gamal and Y-H. Kim. Network Information Theory.Cambridge University Press 2011.network information theorySlides:A. E. Gamal and Y-H. Kim.Lecture Notes on Network Information Theory. arXiv:1001.3404v5,2011. web9/ 74

Outline1Non mathematical introduction2Mathematical introduction: definitions3Typical vectors and the Asymptotic Equipartition Property(AEP)4Lossless Source Coding5Variable length Source coding - Zero error Compression10/ 74

Lecture 1Non mathematical introductionWhat does “communicating” means?11/ 74

What it is about? A bit of history. Information theory (IT) “The fundamental problem of communication is that ofreproducing at one point, either exactly or approximately, amessage selected at another point.” IT established by Claude E. Shannon (1916-2001) in 1948.II12/ 74Seminal paper: “A Mathematical Theory of Communication”in the Bell System Technical Journal, 1948.revolutionary and groundbreaking paper

Teaser 1: compressionHangman game Objective: play. and explain your strategy-----13/ 74

Teaser 1: compressionHangman game Objective: play. and explain your strategy---- 2 winning ideasII13/ 74Letter frequencyCorrelation between successive lettersprobabilitydependence

Teaser 1: compressionAnalogy Hangman game-compression word data (image) Answer to a question(yes/no)removes uncertainty in word 1 bit of the bistream thatrepresents the dataremoves uncertainty in data Goal: propose a minimumnumber of letter Goal: propose a minimumnumber of bits14/ 74

Teaser 2: communication over a noisy channel Context:storing/communicating data on a channel with errorsIII15/ 74scratches on a DVDlost data packets: webpage sent over the internet.lost or modified received signals: wireless links

Teaser 2: communication over a noisy channel1234choose binary vector(x1 , x2 , x3 , x4 )compute x5 , x6 , x7 s.t.XOR in each circle is 0add 1 or 2 errorscorrect errors s.t. rule 2is satisfiedQuiz 1:Assume you know how manyerrors have been introduced.Can one correct 1 error?Can one correct 2 errors?16/ 74

Teaser 2: communication over a noisy channelTake Home Message (THM): To get zero error at the receiver, one can send a FINITEnumber of additional of bits. For a finite number of additional of bits, there is a limit on thenumber of errors that can be corrected.17/ 74

Summary one can compress data by using two ideas:II18/ 74Use non-uniformity of the probabilitiesthis is the source coding theorem (first part of the course)very surprising.Use dependence between the datain middle exam

Summary one can compress data by using two ideas:IIUse non-uniformity of the probabilitiesthis is the source coding theorem (first part of the course)very surprising.Use dependence between the datain middle exam one can send data over a noisy channel and recover the datawithout any errorprovided the data is encoded (send additional data)this is the channel coding theorem (second part of the course)18/ 74

Communicate what?DefinitionSource of information: something that produces messages.DefinitionMessage: a stream of symbols taking their values in an alphabet.ExampleSource: cameraMessage: pictureSymbol: pixel value: 3 coef. (RGB)Alphabet {0, . . . , 255}319/ 74ExampleSource: writerMessage: a textSymbol: letterAlphabet {a, . . . , z, !, ., ?, .}

How to model the communication? Model for the source:communicationmathematical modela source of information a random processa message of the sourcea realization of a random vectora symbol of the sourcea realization of a random variablealphabet of the sourcealphabet of the random variable Model for the communication chain:Source20/ 74messageEncoderChannelDecoderDestinationestimate

How to model the communication? Model for the source:communicationa source of informationa message of the sourcea symbol of the sourcealphabet of the source mathematical modela random processa realization of a random vectora realization of a random variablealphabet of the random variable Model for the communication chain:Source20/ 74messageEncoderChannelDecoderDestinationestimate

Point-to-point Information theoryShannon proposed and proved three fundamental theorems forpoint-to-point communication (1 sender / 1 receiver):1 Lossless source coding theorem: For a given source, what isthe minimum rate at which the source can be compressedlosslessly?rate nb bits / source symbol2 Lossy source coding theorem: For a given source and a givendistortion D, what is the minimum rate at which the source canbe compressed within distortion D.rate nb bits / source symbol3 Channel coding theorem: What is the maximum rate at whichdata can be transmitted reliably?rate nb bits / sent symbol over the channel21/ 74

Application of Information TheoryInformation theory is everywhere.1 Lossless source coding theorem:2 Lossy source coding theorem:3 Channel coding theorem:Quiz 2: On which theorem (1/2/3) rely these applications?(1)(2)(3)(4)(5)22/ 74zip compressionjpeg and mpeg compressionsending a jpeg file onto internetthe 15 digit social security numbermovie stored on a DVD

Reminder (1)Definition (Convergence in probability)Let (Xn )n 1 be a sequence of r.v. and X a r.v. both defined over R.(Xn )n 1 converges in probability to the r.v. X if 0, lim P( Xn X ) 0.n Notation:pXn XQuiz 3: Which of the following statements are true?(1) Xn and X are random(2) Xn is random and X is deterministic (constant)23/ 74

Reminder (2)Theorem (Weak Law of Large Numbers (WLLN))Let (Xn )n 1 be a sequence of r.v. over R.If (Xn )n 1 is i.i.d., L2 (i.e. E[Xn2 ] ) thenX1 . Xn p E[X1 ]nQuiz 4: Which of the following statements are true?(1) for any nonzero margin, with a sufficiently large sample there willbe a very high probability that the average of the observations will beclose to the expected value; that is, within the margin.(2) LHS and RHS are random(3) averaging kills randomness(4) the statistical mean ((a.k.a. true mean) converges tothe empirical mean (a.k.a. sample mean)24/ 74

Outline1Non mathematical introduction2Mathematical introduction: definitions3Typical vectors and the Asymptotic Equipartition Property(AEP)4Lossless Source Coding5Variable length Source coding - Zero error Compression25/ 74

Lecture 2Mathematical introductionDefinitions: Entropy and Mutual Information26/ 74

Some NotationSpecific to information theory are denoted in red Upper case letters X , Y , . . . refer to random process or randomvariable Calligraphic letters X, Y, . . . refer to alphabets A is the cardinality of the set A X n (X1 , X2 , . . . , Xn ) is an n-sequence of random variables or arandom vector 27/ 74Xij (Xi , Xi 1 , . . . , Xj )Lower case x, y , . . . and x n , y n , . . . mean scalars/vectorsrealizationX p(x) means that the r.v. X has probability mass function(pmf) P(X x) p(x)X n p (x n ) means that the discrete random vector X n has jointpmf p (x n )p (y n x n ) is the conditional pmf of Y n given X n x n .

Lecture 1: Entropy (1)Definition (Entropy)the entropy of a discrete random variable X p(x):XH(X ) p(x) log p(x)x XH(X ) in bits/source sample is the average length of theshortest description of the r.v. X . (Shown later)Notation: log : log2Convention: 0 log 0 : 0PropertiesE1 H(X ) only depends on the pmf p(x) and not x.E2 H(X ) EX log p(X )28/ 74

Entropy (2)E3 H(X ) 0 with equality iff X is constant.E4 H(X ) log X . The uniform distribution maximizes entropy.ExampleBinary entropy function: Let 0 p 1hb (p) p log p (1 p) log (1 p)H(X ) for a binary rv.H(X ) measures the amountof uncertainty on the rv X .29/ 74

Entropy (3)E4 (con’t) Alternative proof with the positivity of theKullback-Leibler (KL) divergence.Definition (Kullback-Leibler (KL) divergence)Let p(x) and q(x) be 2 pmfs defined on the same set X.The KL divergence between p and q is:D(p q) Xp(x) logx Xp(x)q(x)Convention: c log c/0 for c 0.Quiz 5: Which of the following statements are true?(1) D(p q) D(q p).(2) If Support(q) Support(p) then D(p q) .30/ 74

Entropy (4)KL1 Positivity of KL [Cover Th. 2.6.3]: D(p q) 0with equality iff x, p(x) q(x).This is a consequence of Jensen’s inequality [Cover Th. 2.6.2]:If f is a convex function and Y is a random variable with numericalvalues, thenE[f (Y )] f (E[Y ])with equality when f (.) is not strictly convex, or when f (.) is strictlyconvex and Y follows a degenerate distribution (i.e. is a constant).KL2 Let X p(x) and q(x) 31/ 741 X ,then D(p q) H(X ) log X

Reminder (independence)Definition (independence)The random variables X and Y are independent, denoted by X Y , if (x, y ) X Y,p(x, y ) p(x)p(y ).Definition (Mutual independence – mutuellement indépendant)For n 3, the random variables X1 , X2 , . . . , Xn are mutuallyindependent if (x1 , . . . , xn ) X1 . . . Xn ,p(x1 , . . . , xn ) p(x1 )p(x2 ) . . . p(xn ).Definition (Pairwise independence – indépendance 2 à 2)For n 3, the random variables Xi , Xj are pairwise independent if (i, j) s.t. 1 i j n, Xi and Xj are independent.32/ 74

Quiz 6Quiz 6: Which of the following statements are/is true?(1) mutual independence implies pairwise independence.(2) pairwise independence implies mutual independence33/ 74

Reminder (conditional independence)Definition (conditional independence)Let X , Y , Z be r.v.X is independent of Z given Y , denoted by X Z Y , if p(x y )p(z y ) if p(y ) 0 (x, y , z) p(x, z y ) 0otherwiseor equivalently( (x, y , z)p(x, y , z) p(x,y )p(y ,z)p(y )0 p(x, y )p(z y ) if p(y ) 0otherwiseor equivalently (x, y , z) X Y Z,34/ 74p(x, y , z)p(y ) p(x, y )p(y , z),

Definition (Markov chain)Let X1 , X2 , ., Xn , n 3 be r.v.X1 X2 . Xn forms a Markov chain if (x1 , ., xn ) p(x1 , x2 )p(x3 x2 ).p(xn xn 1 ) if p(x2 ),.,p(xn 1 ) 0p(x1 , x2 , ., xn ) 0otherwiseor equivalently (x1 , ., xn )p(x1 , x2 , ., xn )p(x2 )p(x3 ).p(xn 1 ) p(x1 , x2 )p(x2 , x3 ).p(xn 1 , xn )Quiz 7: Which of the following statements are true?(1) X Z Y is equivalent to X Z Y(2) X Z Y is equivalent to X Y Z(3) X1 X2 . Xn Xn . X2 X135/ 74

Joint and conditional entropyDefinition (Conditional entropy)For discrete random variables (X , Y ) p(x, y ),the Conditional entropy for a given y is:XH (X Y y ) p(x y ) log p(x y )x Xthe Conditional entropy is:H (X Y ) Xp(y )H (X Y y ) EXY log p (X Y )y Y XXx X y YXXp(x, y ) log p(x y ) p(y )p(x y ) log p(x y )y Yx XH(X Y ) in bits/source sample is the average length of theshortest description of the r.v. X when Y is known.36/ 74

Joint entropyDefinition (Joint entropy)For discrete random variables (X , Y ) p(x, y ), the Joint entropyis:XXH(X , Y ) EXY log p(X , Y ) p(x, y ) log p(x, y )x X y YH(X , Y ) in bits/source sample is the average length of theshortest description of ?.37/ 74

PropertiesJCE1 trick H(X , Y ) H(X ) H(Y X ) H(Y ) H(X Y )JCE2 H(X , Y ) H(X ) H(Y ) with equality iff X and Y areindependent (denoted X Y ).JCE3 Conditioning reduces entropyH (X Y ) H(X ) with equality iff X YJCE4 Chain rule for entropy (formule des conditionnements successifs)Let X n be a discrete random vectorH (X n ) H (X1 ) H (X2 X1 ) . . . H (Xn Xn 1 , . . . , X1 )nX H (Xi Xi 1 , . . . , X1 ) i 1nXi 1n XH Xi X i 1 H(Xi )i 1with notation H(X1 X 0 ) H(X1 ).38/ 74

JCE5 H(X Y ) 0 with equality iff X f (Y ) a.s.JCE6 H(X X ) 0 and H(X , X ) H(X )JCE7 Data processing inequality. Let X be a discrete randomvariable and g (X ) be a function of X , thenH(g (X )) H(X )with equality iff g (x) is injective on the support of p(x).JCE8 Fano’s inequality: link between entropy and error prob.Let (X , Y ) p(x, y ) and Pe P{X 6 Y }, thenH(X Y ) hb (Pe ) Pe log( X 1) 1 Pe log( X 1)JCE9 H(X Z ) H(X Y , Z ) with equality iff X and Y areindependent given Z (denoted X Y Z ).JCE10 H(X , Y Z ) H(X Z ) H(Y Z ) with equality iff X Y Z .39/ 74

Venn diagramis represented byX (a r.v.) H(X ) H(X , Y ) set (set of realizations)area of the setarea of the union of setsExercise1 Draw a Venn Diagram for 2 r.v. X and Y .Show H(X ), H(Y ), H(X , Y ) and H(Y X ).2 Show the case X Y3 Draw a Venn Diagram for 3 r.v. X , Y and Z and show thedecomposition H(X , Y , Z ) H(X ) H(Y X ) H(Z X , Y ).4 Show the case X Y Z40/ 74

Mutual InformationDefinition (Mutual Information)For discrete random variables (X , Y ) p(x, y ), the MutualInformation is:XXp(x, y )I (X ;Y ) p(x, y ) logp(x)p(y )x X y Y H(X ) H(X Y ) H(Y ) H(Y X ) H(X ) H(Y ) H(X , Y )Exercise Show I (X ; Y ) on the Venn Diagram representing X and Y .41/ 74

Mutual Information: propertiesMI1MI2MI3MI4MI5I (X ; Y ) is a function of p(x, y )I (X ; Y ) is symmetric: I (X ; Y ) I (Y ; X )I (X ; X ) H(X )I (X ; Y ) D(p(x, y ) p(x)p(y ))I (X ; Y ) 0with equality iff X YMI6 I (X ; Y ) min(H(X ), H(Y ))with equality iff X f (Y ) a.s. or Y f (X ) a.s.42/ 74

Conditional Mutual InformationDefinition (Conditional Mutual Information)For discrete random variables (X , Y , Z ) p(x, y , z), theConditional Mutual Information is:I (X ; Y Z ) XXXp(x, y , z) logx X y Y z Zp(x, y z)p(x z)p(y z) H(X Z ) H(X Y , Z ) H(Y Z ) H(Y X , Z )Exercise Show I (X ; Y Z ) and I (X ; Z ) on the Venn Diagramrepresenting X , Y , Z .CMI1 I (X ; Y Z ) 0 with equality iff X Y ZExercise Compare I (X ;Y , Z ) with I (X ; Y Z ) I (X ; Z ) on the VennDiagram representing X , Y , Z .43/ 74

CMI2 Chain rulenI (X ; Y ) nXI Xi ; Y X i 1 i 1CMI3 If X YCMI4 Corollary:CMI5 Corollary:If X Y Z form a Markov chain, then I (X ; Z Y ) 0If X Y Z , then I (X ; Y ) I (X ; Y Z )Data processing inequality: Z form a Markov chain, then I (X ; Y ) I (X ; Z )Exercise Draw the Venn Diagram of the Markov chain X Y ZCMI6 There is no order relation between I (X ; Y ) and I (X ; Y Z )Faux amis: Recall H(X Z ) H(X )Hint: show an example s.t. I (X ; Y ) I (X ; Y Z ) and anexample s.t. I (X ; Y ) I (X ; Y Z )Exercise Show the area that represents I (X ; Y ) I (X ; Y Z ) on theVenn Diagram.44/ 74

Outline1Non mathematical introduction2Mathematical introduction: definitions3Typical vectors and the Asymptotic Equipartition Property(AEP)4Lossless Source Coding5Variable length Source coding - Zero error Compression45/ 74

Lecture 3Typical vectors andAsymptotic Equipartition Property (AEP)46/ 74

Re-reminderDefinition (Convergence in probability)Let (Xn )n 1 be a sequence of r.v. and X a r.v. both defined over Rd .(Xn )n 1 converges in probability to the r.v. X if 0, lim P( Xn X ) 0.n Notation:pXn XTheorem (Weak Law of Large Numbers (WLLN))Let (Xn )n 1 be a vector of r.v. over R.If (Xn )n 1 is i.i.d., L2 (i.e. E[Xn2 ] ) thenX1 . Xn p E[X1 ]n47/ 74

Theorem (Asymptotic Equipartition Property (AEP))Let X1 , X2 , . . . be i.i.d.Qn p(x) finite random process (source), letnus denote p (x ) i 1 p (xi ), then1 log p (X n ) H(X )n48/ 74in probability

Definition (Typical set)(n)Let 0, n 0 andQX p(x), the set A (X ) of -typical vectorsx n , where p (x n ) ni 1 p (xi ) is defined as 1(n)nnA (X ) x : log p (x ) H(X ) nPropertiesAEP1 ( , n), all these statements are equivalent:x n A(n) 2 n(H(X ) ) p (x n ) 2 n(H(X ) ) . p (x n ) 2 n(H(X ) ).1log an b for n sufficiently large.n“uniform distribution on the typical set”Notation: an 2n(b ) 49/ 74

Interpretation of typicality(n)A (X )Xn.x.nExample of typical vectorsP[X x] p(x)x n (x1 , .xi , .xn )nx {i : xi x} nx p(x) thennYYp(x n ) p(xi ) p(x)nxLet x n satisfiesiP 2xx Xnp(x) log p(x) 2 nH(X )x n represents well the distributionSo, x n is -typical, .Quiz Let X B(0.2), 0.1 and n 10. Which of the following x n vector is -typical? a (0100000100) b (1100000000) c (1111111111) Let X B(0.5), 0.1 and n 10. Which x n vectors are -typical?50/ 74

Properties (X) 1AEP2 0, lim P X n A(n) n “for a given , asymptotically a.s. typical”Theorem (CT Th. 3.1.2)QGiven 0. Assume that n, X n ni 1 p (xi ).Then, for n sufficiently large, we have1 n 1 X A(n)P A(n) (X ) (X ) P2n(H(X ) )A(n) (X ) 23n(H(X ) )A(n) (X ) (1 )2(n)2 and 3 can be summarized in A 51/ 74. 2n(H(X ) 2 ) .

Outline1Non mathematical introduction2Mathematical introduction: definitions3Typical vectors and the Asymptotic Equipartition Property(AEP)4Lossless Source Coding5Variable length Source coding - Zero error Compression52/ 74

Lecture 4Lossless Source CodingLossless Source CodingLossless data compression53/ 74

Compression system model:UnSourceÛ nIEncodernR bitsDecoderWe assume a finite alphabet i.i.d. source U1 , U2 , . . . p(u).Definition (Fixed-length Source code (FLC)) Let R R , n N . A 2nR , n fixed-length source code consists of:nn1 An encoding function that assigns to each u U an indexnRi {1, 2, . . . , 2 }, i.e., a codeword of length nR bits:Un I {1, 2, ., 2nR }u n 7 i(u n )254/ 74A decoding function that assigns an estimate û n (i) to eachreceived index iI Uni 7 û n (i)

Definition (Probability of decoding error)Let n N . The probability of decoding error is(n)Pe P{Û n 6 U n }R is called the compression rate: number of bits per source sample.Definition (Achievable rate)Let R R . A rate R is achievable if there exists a sequence of(n)2nR , n codes with Pe 0 as n 55/ 74

Source Coding TheoremThe source coding problem is to find the infimum of all achievablerates.Theorem (Source coding theorem (Shannon’48))Let U p(u) be a finite alphabet i.i.d. source. Let R R .[Achievability]. If R H(U), (n)then there exists a sequence of 2nR , n codes s.t. Pe 0. (n)[Converse]. For any sequence of 2nR , n codes s.t. Pe 0,R

Information Theory and Network Coding. Springer 2008. network coding 8/ 74. Course material A. E. Gamal and Y-H. Kim. Network Information Theory. Cambridge University Press 2011. network information theory Slides: A. E. Gamal and Y-H. Kim. Lecture Notes on Network Information Theory. arXiv:1001.3404v5, 2011. web 9/ 74.

Related Documents:

Facebook: SAMSON Connect Email: stephane.redon@inria.fr Phone: 33 4 38 78 16 92 Address: NANO-D Antenne INRIA – GIANT MINATEC Parvis Louis Néel 38000 Grenoble France April, 10, 2017 Appointments 01/08- : INRIA: Research Scientist, head of the NANO-D group

INRIA1, CEREA2, UVSQ3, FRANCE Etienne.Huot@inria.fr Giuseppe Papari Lithicon Norway AS NORWAY papari@lithicon.com Isabelle Herlin INRIA1, CEREA2 FRANCE Isabelle.Herlin@inria.fr Abstract This paper describes modeling and numerical computa-tion of orthogonal bases, which are used to describe im-ages and motion fields. Motion estimation from .

RAPPORT DE STAGE Licence Professionnelle Systèmes Informatiques et Logiciels spécialité Imagerie Numérique et Informatique Stage effectué à l'INRIA Sophia Antipolis, équipe projet MASCOTTE mai août 2008. IUT Nice Côte d'Azur Rapport de stage MASCOTTE/INRIA REMERCIEMENTS Je tiens à remercier tou

SCILAB REFERENCE MANUAL Scilab Group INRIA Meta2 Project/ENPC Cergrene INRIA - Unit e de recherche de Rocquencourt - Projet Meta2 Domaine de Voluceau - Rocquencourt - BP 105 - 78153 Le Chesnay Cedex (France) Email: Scilab@inria.fr. Contents 1 Basic functions 25

lors du déploiement de l'enseignement ISN et mène des actions de culture scientifique sur ces sujets. Inria apporte son expertise scientifique pour la conception du guide pédagogique « 1, 2, 3. codez ! ». Inria participe également aux formations mises en œuvre par la Fondation La main à la pâte. www.inria.fr

STTL STTL : transformation language for RDF XSLT : transformation language for XML Input RDF graph Output Text format SPARQL based Declarative transformation rules

Technologie Wi-Fi et vie priv ee Mathieu Cunche mathieu.cunche@inria.fr @Cunchem INSA-Lyon CITI, Inria Privatics Ecole d' et e Rescom - 26 Juin 2015

EUROGRAPHICS 2010/ H. Hauser and E. Reinhard STAR - State of The Art Report State of the Art in Procedural Noise Functions A. Lagae1,2 S. Lefebvre2,3 R. Cook4 T. DeRose4 G. Drettakis2 D.S. Ebert5 J.P. Lewis6 K. Perlin7 M. Zwicker8 1Katholieke Universiteit Leuven 2REVES/INRIA Sophia-Antipolis 3ALICE/INRIA Nancy Grand-Est / Loria 4Pixar Animation Studios 5Purdue University 6Weta Digital 7New .