76

I need to determine the KL-divergence between two Gaussians. I am comparing my results to these, but I can't reproduce their result. My result is obviously wrong, because the KL is not 0 for KL(p, p).

I wonder where I am doing a mistake and ask if anyone can spot it.

Let p(x)=N(μ1,σ1) and q(x)=N(μ2,σ2). From Bishop's PRML I know that

KL(p,q)=p(x)logq(x)dx+p(x)logp(x)dx

where integration is done over all real line, and that

p(x)logp(x)dx=12(1+log2πσ12),

so I restrict myself to p(x)logq(x)dx, which I can write out as

p(x)log1(2πσ22)(1/2)e(xμ2)22σ22dx,

which can be separated into

12log(2πσ22)p(x)loge(xμ2)22σ22dx.

Taking the log I get

12log(2πσ22)p(x)((xμ2)22σ22)dx,

where I separate the sums and get σ22 out of the integral.

12log(2πσ22)+p(x)x2dxp(x)2xμ2dx+p(x)μ22dx2σ22

Letting denote the expectation operator under p, I can rewrite this as

12log(2πσ22)+x22xμ2+μ222σ22.

We know that var(x)=x2x2. Thus

x2=σ12+μ12

and therefore

12log(2πσ2)+σ12+μ122μ1μ2+μ222σ22,

which I can put as

12log(2πσ22)+σ12+(μ1μ2)22σ22.

Putting everything together, I get to

KL(p,q)=p(x)logq(x)dx+p(x)logp(x)dx=12log(2πσ22)+σ12+(μ1μ2)22σ2212(1+log2πσ12)=logσ2σ1+σ12+(μ1μ2)22σ22.
Which is wrong since it equals 1 for two identical Gaussians.

Can anyone spot my error?

Update

Thanks to mpiktas for clearing things up. The correct answer is:

KL(p,q)=logσ2σ1+σ12+(μ1μ2)22σ2212

|cite|improve this question
  • sorry for posting the incorrect answer in the first place. I just looked at xμ1 and immediately thought that the integral is zero. The point that it was squared completely missed my mind :) – mpiktas Feb 21 '11 at 12:02
  • what about the multi variate case? – user7001 Oct 23 '11 at 0:49
  • I have just seen in a research paper that kld should be $KL(p, q) = ½ * ((μ₁-μ₂)² + σ₁²+σ₂²) * ( (1/σ₁²) + (1/σ₂²) ) - 2 – skyde Aug 1 '13 at 14:26
  • 1
    I think there is a typo in your question, since I cannot validate it and it also seems that you used the correct version later in your question:
    p(x)logp(x)dx=12(1+log2πσ12)
    I think it should be (note the minus):
    p(x)logp(x)dx=12(1+log2πσ12)
    I tried to edit your question and got banned for it, so maybe do it yourself.
    – y-spreen Jan 25 '18 at 13:49
  • The answer is also in my 1996 paper on Intrinsic losses. – Xi'an Mar 29 '18 at 20:27
57

OK, my bad. The error is in the last equation:

KL(p,q)=p(x)logq(x)dx+p(x)logp(x)dx=12log(2πσ22)+σ12+(μ1μ2)22σ2212(1+log2πσ12)=logσ2σ1+σ12+(μ1μ2)22σ2212

Note the missing 12. The last line becomes zero when μ1=μ2 and σ1=σ2.

|cite|improve this answer
  • @mpiktas I meant the question really - bayerj Is a well published researcher and I'm an undergrad. Nice to see that even the smart guys fall back to asking on the internet sometimes :) – N. McA. Apr 5 '16 at 10:19
  • 2
    is p μ1σ1 or μ2σ2 – Kong Jan 20 '18 at 23:41
29

I did not have a look at your calculation but here is mine with a lot of details. Suppose p is the density of a normal random variable with mean μ1 and variance σ12, and that q is the density of a normal random variable with mean μ2 and variance σ22. The Kullback-Leibler distance from q to p is:

[log(p(x))log(q(x))]p(x)dx

=[12log(2π)log(σ1)12(xμ1σ1)2+12log(2π)+log(σ2)+12(xμ2σ2)2] ×12πσ1exp[12(xμ1σ1)2]dx

={log(σ2σ1)+12[(xμ2σ2)2(xμ1σ1)2]} ×12πσ1exp[12(xμ1σ1)2]dx

=E1{log(σ2σ1)+12[(xμ2σ2)2(xμ1σ1)2]}

=log(σ2σ1)+12σ22E1{(Xμ2)2}12σ12E1{(Xμ1)2}

=log(σ2σ1)+12σ22E1{(Xμ2)2}12

(Now note that (Xμ2)2=(Xμ1+μ1μ2)2=(Xμ1)2+2(Xμ1)(μ1μ2)+(μ1μ2)2)

=log(σ2σ1)+12σ22[E1{(Xμ1)2}+2(μ1μ2)E1{Xμ1}+(μ1μ2)2]12

=log(σ2σ1)+σ12+(μ1μ2)22σ2212

|cite|improve this answer

protected by kjetil b halvorsen Nov 10 '18 at 21:59

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?

Not the answer you're looking for? Browse other questions tagged or ask your own question.