76

I need to determine the KL-divergence between two Gaussians. I am comparing my results to these, but I can't reproduce their result. My result is obviously wrong, because the KL is not 0 for KL(p, p).

I wonder where I am doing a mistake and ask if anyone can spot it.

Let p(x)=N(ฮผ1,ฯƒ1) and q(x)=N(ฮผ2,ฯƒ2). From Bishop's PRML I know that

KL(p,q)=โˆ’โˆซp(x)logโกq(x)dx+โˆซp(x)logโกp(x)dx

where integration is done over all real line, and that

โˆซp(x)logโกp(x)dx=โˆ’12(1+logโก2ฯ€ฯƒ12),

so I restrict myself to โˆซp(x)logโกq(x)dx, which I can write out as

โˆ’โˆซp(x)logโก1(2ฯ€ฯƒ22)(1/2)eโˆ’(xโˆ’ฮผ2)22ฯƒ22dx,

which can be separated into

12logโก(2ฯ€ฯƒ22)โˆ’โˆซp(x)logโกeโˆ’(xโˆ’ฮผ2)22ฯƒ22dx.

Taking the log I get

12logโก(2ฯ€ฯƒ22)โˆ’โˆซp(x)(โˆ’(xโˆ’ฮผ2)22ฯƒ22)dx,

where I separate the sums and get ฯƒ22 out of the integral.

12logโก(2ฯ€ฯƒ22)+โˆซp(x)x2dxโˆ’โˆซp(x)2xฮผ2dx+โˆซp(x)ฮผ22dx2ฯƒ22

Letting โŸจโŸฉ denote the expectation operator under p, I can rewrite this as

12logโก(2ฯ€ฯƒ22)+โŸจx2โŸฉโˆ’2โŸจxโŸฉฮผ2+ฮผ222ฯƒ22.

We know that var(x)=โŸจx2โŸฉโˆ’โŸจxโŸฉ2. Thus

โŸจx2โŸฉ=ฯƒ12+ฮผ12

and therefore

12logโก(2ฯ€ฯƒ2)+ฯƒ12+ฮผ12โˆ’2ฮผ1ฮผ2+ฮผ222ฯƒ22,

which I can put as

12logโก(2ฯ€ฯƒ22)+ฯƒ12+(ฮผ1โˆ’ฮผ2)22ฯƒ22.

Putting everything together, I get to

KL(p,q)=โˆ’โˆซp(x)logโกq(x)dx+โˆซp(x)logโกp(x)dx=12logโก(2ฯ€ฯƒ22)+ฯƒ12+(ฮผ1โˆ’ฮผ2)22ฯƒ22โˆ’12(1+logโก2ฯ€ฯƒ12)=logโกฯƒ2ฯƒ1+ฯƒ12+(ฮผ1โˆ’ฮผ2)22ฯƒ22.
Which is wrong since it equals 1 for two identical Gaussians.

Can anyone spot my error?

Update

Thanks to mpiktas for clearing things up. The correct answer is:

KL(p,q)=logโกฯƒ2ฯƒ1+ฯƒ12+(ฮผ1โˆ’ฮผ2)22ฯƒ22โˆ’12

|cite|improve this question
  • sorry for posting the incorrect answer in the first place. I just looked at xโˆ’ฮผ1 and immediately thought that the integral is zero. The point that it was squared completely missed my mind :) โ€“ mpiktas Feb 21 '11 at 12:02
  • what about the multi variate case? โ€“ user7001 Oct 23 '11 at 0:49
  • I have just seen in a research paper that kld should be $KL(p, q) = ยฝ * ((ฮผโ‚-ฮผโ‚‚)ยฒ + ฯƒโ‚ยฒ+ฯƒโ‚‚ยฒ) * ( (1/ฯƒโ‚ยฒ) + (1/ฯƒโ‚‚ยฒ) ) - 2 โ€“ skyde Aug 1 '13 at 14:26
  • 1
    I think there is a typo in your question, since I cannot validate it and it also seems that you used the correct version later in your question:
    โˆซp(x)logโกp(x)dx=12(1+logโก2ฯ€ฯƒ12)
    I think it should be (note the minus):
    โˆซp(x)logโกp(x)dx=โˆ’12(1+logโก2ฯ€ฯƒ12)
    I tried to edit your question and got banned for it, so maybe do it yourself.
    โ€“ y-spreen Jan 25 '18 at 13:49
  • The answer is also in my 1996 paper on Intrinsic losses. โ€“ Xi'an Mar 29 '18 at 20:27
57

OK, my bad. The error is in the last equation:

KL(p,q)=โˆ’โˆซp(x)logโกq(x)dx+โˆซp(x)logโกp(x)dx=12logโก(2ฯ€ฯƒ22)+ฯƒ12+(ฮผ1โˆ’ฮผ2)22ฯƒ22โˆ’12(1+logโก2ฯ€ฯƒ12)=logโกฯƒ2ฯƒ1+ฯƒ12+(ฮผ1โˆ’ฮผ2)22ฯƒ22โˆ’12

Note the missing โˆ’12. The last line becomes zero when ฮผ1=ฮผ2 and ฯƒ1=ฯƒ2.

|cite|improve this answer
  • @mpiktas I meant the question really - bayerj Is a well published researcher and I'm an undergrad. Nice to see that even the smart guys fall back to asking on the internet sometimes :) โ€“ N. McA. Apr 5 '16 at 10:19
  • 2
    is p ฮผ1ฯƒ1 or ฮผ2ฯƒ2 โ€“ Kong Jan 20 '18 at 23:41
29

I did not have a look at your calculation but here is mine with a lot of details. Suppose p is the density of a normal random variable with mean ฮผ1 and variance ฯƒ12, and that q is the density of a normal random variable with mean ฮผ2 and variance ฯƒ22. The Kullback-Leibler distance from q to p is:

โˆซ[logโก(p(x))โˆ’log(q(x))]p(x)dx

=โˆซ[โˆ’12logโก(2ฯ€)โˆ’logโก(ฯƒ1)โˆ’12(xโˆ’ฮผ1ฯƒ1)2+12logโก(2ฯ€)+logโก(ฯƒ2)+12(xโˆ’ฮผ2ฯƒ2)2] ร—12ฯ€ฯƒ1expโก[โˆ’12(xโˆ’ฮผ1ฯƒ1)2]dx

=โˆซ{logโก(ฯƒ2ฯƒ1)+12[(xโˆ’ฮผ2ฯƒ2)2โˆ’(xโˆ’ฮผ1ฯƒ1)2]} ร—12ฯ€ฯƒ1expโก[โˆ’12(xโˆ’ฮผ1ฯƒ1)2]dx

=E1{logโก(ฯƒ2ฯƒ1)+12[(xโˆ’ฮผ2ฯƒ2)2โˆ’(xโˆ’ฮผ1ฯƒ1)2]}

=logโก(ฯƒ2ฯƒ1)+12ฯƒ22E1{(Xโˆ’ฮผ2)2}โˆ’12ฯƒ12E1{(Xโˆ’ฮผ1)2}

=logโก(ฯƒ2ฯƒ1)+12ฯƒ22E1{(Xโˆ’ฮผ2)2}โˆ’12

(Now note that (Xโˆ’ฮผ2)2=(Xโˆ’ฮผ1+ฮผ1โˆ’ฮผ2)2=(Xโˆ’ฮผ1)2+2(Xโˆ’ฮผ1)(ฮผ1โˆ’ฮผ2)+(ฮผ1โˆ’ฮผ2)2)

=logโก(ฯƒ2ฯƒ1)+12ฯƒ22[E1{(Xโˆ’ฮผ1)2}+2(ฮผ1โˆ’ฮผ2)E1{Xโˆ’ฮผ1}+(ฮผ1โˆ’ฮผ2)2]โˆ’12

=logโก(ฯƒ2ฯƒ1)+ฯƒ12+(ฮผ1โˆ’ฮผ2)22ฯƒ22โˆ’12

|cite|improve this answer

protected by kjetil b halvorsen Nov 10 '18 at 21:59

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

Not the answer you're looking for? Browse other questions tagged or ask your own question.