KL divergence between two univariate Gaussians

I need to determine the KL-divergence between two Gaussians. I am comparing my results to these, but I can't reproduce their result. My result is obviously wrong, because the KL is not 0 for KL(p, p).

I wonder where I am doing a mistake and ask if anyone can spot it.

Let $$𝑝(𝑥)=𝑁(𝜇1,𝜎1)p(x)=N(μ1,σ1)p(x) = N(\mu_1, \sigma_1)$$ and $$𝑞(𝑥)=𝑁(𝜇2,𝜎2)q(x)=N(μ2,σ2)q(x) = N(\mu_2, \sigma_2)$$. From Bishop's PRML I know that



$KL\left(p,q\right)=-\int p\left(x\right)\mathrm{log}q\left(x\right)dx+\int p\left(x\right)\mathrm{log}p\left(x\right)dx$

where integration is done over all real line, and that



$\int p\left(x\right)\mathrm{log}p\left(x\right)dx=-\frac{1}{2}\left(1+\mathrm{log}2\pi {\sigma }_{1}^{2}\right),$

so I restrict myself to $$∫𝑝(𝑥)log𝑞(𝑥)𝑑𝑥∫p(x)log⁡q(x)dx\int p(x) \log q(x) dx$$, which I can write out as



$-\int p\left(x\right)\mathrm{log}\frac{1}{\left(2\pi {\sigma }_{2}^{2}{\right)}^{\left(1/2\right)}}{e}^{-\frac{\left(x-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}}dx,$

which can be separated into



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)-\int p\left(x\right)\mathrm{log}{e}^{-\frac{\left(x-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}}dx.$

Taking the log I get



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)-\int p\left(x\right)\left(-\frac{\left(x-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}\right)dx,$

where I separate the sums and get $$𝜎22σ22\sigma_2^2$$ out of the integral.



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)+\frac{\int p\left(x\right){x}^{2}dx-\int p\left(x\right)2x{\mu }_{2}dx+\int p\left(x\right){\mu }_{2}^{2}dx}{2{\sigma }_{2}^{2}}$

Letting $$⟨⟩⟨⟩\langle \rangle$$ denote the expectation operator under $$𝑝pp$$, I can rewrite this as



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)+\frac{⟨{x}^{2}⟩-2⟨x⟩{\mu }_{2}+{\mu }_{2}^{2}}{2{\sigma }_{2}^{2}}.$

We know that $$𝑣𝑎𝑟(𝑥)=⟨𝑥2⟩−⟨𝑥⟩2var(x)=⟨x2⟩−⟨x⟩2var(x) = \langle x^2 \rangle - \langle x \rangle ^2$$. Thus



$⟨{x}^{2}⟩={\sigma }_{1}^{2}+{\mu }_{1}^{2}$

and therefore



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }^{2}\right)+\frac{{\sigma }_{1}^{2}+{\mu }_{1}^{2}-2{\mu }_{1}{\mu }_{2}+{\mu }_{2}^{2}}{2{\sigma }_{2}^{2}},$

which I can put as



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)+\frac{{\sigma }_{1}^{2}+\left({\mu }_{1}-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}.$

Putting everything together, I get to



$\begin{array}{rl}KL\left(p,q\right)& =-\int p\left(x\right)\mathrm{log}q\left(x\right)dx+\int p\left(x\right)\mathrm{log}p\left(x\right)dx\\ \\ & =\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)+\frac{{\sigma }_{1}^{2}+\left({\mu }_{1}-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}-\frac{1}{2}\left(1+\mathrm{log}2\pi {\sigma }_{1}^{2}\right)\\ \\ & =\mathrm{log}\frac{{\sigma }_{2}}{{\sigma }_{1}}+\frac{{\sigma }_{1}^{2}+\left({\mu }_{1}-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}.\end{array}$
Which is wrong since it equals $$111$$ for two identical Gaussians.

Can anyone spot my error?

Update

Thanks to mpiktas for clearing things up. The correct answer is:

$$𝐾𝐿(𝑝,𝑞)=log𝜎2𝜎1+𝜎21+(𝜇1−𝜇2)22𝜎22−12KL(p,q)=log⁡σ2σ1+σ12+(μ1−μ2)22σ22−12KL(p, q) = \log \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2}$$

• sorry for posting the incorrect answer in the first place. I just looked at $x-{\mu }_{1}$$x-\mu_1$ and immediately thought that the integral is zero. The point that it was squared completely missed my mind :) – mpiktas Feb 21 '11 at 12:02
• what about the multi variate case? – user7001 Oct 23 '11 at 0:49
• I have just seen in a research paper that kld should be \$KL(p, q) = ½ * ((μ₁-μ₂)² + σ₁²+σ₂²) * ( (1/σ₁²) + (1/σ₂²) ) - 2 – skyde Aug 1 '13 at 14:26
• I think there is a typo in your question, since I cannot validate it and it also seems that you used the correct version later in your question:
$\int p\left(x\right)\mathrm{log}p\left(x\right)dx=\frac{1}{2}\left(1+\mathrm{log}2\pi {\sigma }_{1}^{2}\right)$
I think it should be (note the minus):
$\int p\left(x\right)\mathrm{log}p\left(x\right)dx=-\frac{1}{2}\left(1+\mathrm{log}2\pi {\sigma }_{1}^{2}\right)$
I tried to edit your question and got banned for it, so maybe do it yourself.
– y-spreen Jan 25 '18 at 13:49
• The answer is also in my 1996 paper on Intrinsic losses. – Xi'an Mar 29 '18 at 20:27

2 Answers

OK, my bad. The error is in the last equation:

$\begin{array}{rl}KL\left(p,q\right)& =-\int p\left(x\right)\mathrm{log}q\left(x\right)dx+\int p\left(x\right)\mathrm{log}p\left(x\right)dx\\ \\ & =\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)+\frac{{\sigma }_{1}^{2}+\left({\mu }_{1}-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}-\frac{1}{2}\left(1+\mathrm{log}2\pi {\sigma }_{1}^{2}\right)\\ \\ & =\mathrm{log}\frac{{\sigma }_{2}}{{\sigma }_{1}}+\frac{{\sigma }_{1}^{2}+\left({\mu }_{1}-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}-\frac{1}{2}\end{array}$

Note the missing $-\frac{1}{2}$$-\frac{1}{2}$. The last line becomes zero when ${\mu }_{1}={\mu }_{2}$$\mu_1=\mu_2$ and ${\sigma }_{1}={\sigma }_{2}$$\sigma_1=\sigma_2$.

• @mpiktas I meant the question really - bayerj Is a well published researcher and I'm an undergrad. Nice to see that even the smart guys fall back to asking on the internet sometimes :) – N. McA. Apr 5 '16 at 10:19
• is p ${\mu }_{1}{\sigma }_{1}$$\mu_1 \sigma_1$ or ${\mu }_{2}{\sigma }_{2}$$\mu_2 \sigma_2$ – Kong Jan 20 '18 at 23:41

I did not have a look at your calculation but here is mine with a lot of details. Suppose $$𝑝pp$$ is the density of a normal random variable with mean $$𝜇1μ1\mu_1$$ and variance $$𝜎21σ12\sigma^2_1$$, and that $$𝑞qq$$ is the density of a normal random variable with mean $$𝜇2μ2\mu_2$$ and variance $$𝜎22σ22\sigma^2_2$$. The Kullback-Leibler distance from $$𝑞qq$$ to $$𝑝pp$$ is:

$$∫[log(𝑝(𝑥))−𝑙𝑜𝑔(𝑞(𝑥))]𝑝(𝑥)𝑑𝑥∫[log⁡(p(x))−log(q(x))]p(x)dx\int \left[\log( p(x)) - log( q(x)) \right] p(x) dx$$

$$=∫[−12log(2𝜋)−log(𝜎1)−12(𝑥−𝜇1𝜎1)2+12log(2𝜋)+log(𝜎2)+12(𝑥−𝜇2𝜎2)2]=∫[−12log⁡(2π)−log⁡(σ1)−12(x−μ1σ1)2+12log⁡(2π)+log⁡(σ2)+12(x−μ2σ2)2]=\int \left[ -\frac{1}{2} \log(2\pi) - \log(\sigma_1) - \frac{1}{2} \left(\frac{x-\mu_1}{\sigma_1}\right)^2 + \frac{1}{2}\log(2\pi) + \log(\sigma_2) + \frac{1}{2} \left(\frac{x-\mu_2}{\sigma_2}\right)^2 \right]$$ $$×12𝜋√𝜎1exp[−12(𝑥−𝜇1𝜎1)2]𝑑𝑥×12πσ1exp⁡[−12(x−μ1σ1)2]dx\times \frac{1}{\sqrt{2\pi}\sigma_1} \exp\left[-\frac{1}{2}\left(\frac{x-\mu_1}{\sigma_1}\right)^2\right] dx$$

$$=∫{log(𝜎2𝜎1)+12[(𝑥−𝜇2𝜎2)2−(𝑥−𝜇1𝜎1)2]}=∫{log⁡(σ2σ1)+12[(x−μ2σ2)2−(x−μ1σ1)2]}=\int \left\{\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2} \left[ \left(\frac{x-\mu_2}{\sigma_2}\right)^2 - \left(\frac{x-\mu_1}{\sigma_1}\right)^2 \right] \right\}$$ $$×12𝜋√𝜎1exp[−12(𝑥−𝜇1𝜎1)2]𝑑𝑥×12πσ1exp⁡[−12(x−μ1σ1)2]dx\times \frac{1}{\sqrt{2\pi}\sigma_1} \exp\left[-\frac{1}{2}\left(\frac{x-\mu_1}{\sigma_1}\right)^2\right] dx$$

$$=𝐸1{log(𝜎2𝜎1)+12[(𝑥−𝜇2𝜎2)2−(𝑥−𝜇1𝜎1)2]}=E1{log⁡(σ2σ1)+12[(x−μ2σ2)2−(x−μ1σ1)2]}=E_{1} \left\{\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2} \left[ \left(\frac{x-\mu_2}{\sigma_2}\right)^2 - \left(\frac{x-\mu_1}{\sigma_1}\right)^2 \right]\right\}$$

$$=log(𝜎2𝜎1)+12𝜎22𝐸1{(𝑋−𝜇2)2}−12𝜎21𝐸1{(𝑋−𝜇1)2}=log⁡(σ2σ1)+12σ22E1{(X−μ2)2}−12σ12E1{(X−μ1)2}=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2\sigma_2^2} E_1 \left\{(X-\mu_2)^2\right\} - \frac{1}{2\sigma_1^2} E_1 \left\{(X-\mu_1)^2\right\}$$

$$=log(𝜎2𝜎1)+12𝜎22𝐸1{(𝑋−𝜇2)2}−12=log⁡(σ2σ1)+12σ22E1{(X−μ2)2}−12=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2\sigma_2^2} E_1 \left\{(X-\mu_2)^2\right\} - \frac{1}{2}$$

(Now note that $$(𝑋−𝜇2)2=(𝑋−𝜇1+𝜇1−𝜇2)2=(𝑋−𝜇1)2+2(𝑋−𝜇1)(𝜇1−𝜇2)+(𝜇1−𝜇2)2(X−μ2)2=(X−μ1+μ1−μ2)2=(X−μ1)2+2(X−μ1)(μ1−μ2)+(μ1−μ2)2(X - \mu_2)^2 = (X-\mu_1+\mu_1-\mu_2)^2 = (X-\mu_1)^2 + 2(X-\mu_1)(\mu_1-\mu_2) + (\mu_1-\mu_2)^2$$)

$$=log(𝜎2𝜎1)+12𝜎22[𝐸1{(𝑋−𝜇1)2}+2(𝜇1−𝜇2)𝐸1{𝑋−𝜇1}+(𝜇1−𝜇2)2]−12=log⁡(σ2σ1)+12σ22[E1{(X−μ1)2}+2(μ1−μ2)E1{X−μ1}+(μ1−μ2)2]−12=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2\sigma_2^2} \left[E_1\left\{(X-\mu_1)^2\right\} + 2(\mu_1-\mu_2)E_1\left\{X-\mu_1\right\} + (\mu_1-\mu_2)^2\right] - \frac{1}{2}$$

$$=log(𝜎2𝜎1)+𝜎21+(𝜇1−𝜇2)22𝜎22−12=log⁡(σ2σ1)+σ12+(μ1−μ2)22σ22−12=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{\sigma_1^2 + (\mu_1-\mu_2)^2}{2\sigma_2^2} - \frac{1}{2}$$

protected by kjetil b halvorsenNov 10 '18 at 21:59

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).

Would you like to answer one of these unanswered questions instead?