# KL divergence between two univariate Gaussians

I need to determine the KL-divergence between two Gaussians. I am comparing my results to these, but I can't reproduce their result. My result is obviously wrong, because the KL is not 0 for KL(p, p).

I wonder where I am doing a mistake and ask if anyone can spot it.

Let $$𝑝(𝑥)=𝑁(𝜇1,𝜎1)p(x)=N(μ1,σ1)p(x) = N(\mu_1, \sigma_1)$$ and $$𝑞(𝑥)=𝑁(𝜇2,𝜎2)q(x)=N(μ2,σ2)q(x) = N(\mu_2, \sigma_2)$$. From Bishop's PRML I know that



$KL\left(p,q\right)=-\int p\left(x\right)\mathrm{log}q\left(x\right)dx+\int p\left(x\right)\mathrm{log}p\left(x\right)dx$

where integration is done over all real line, and that



$\int p\left(x\right)\mathrm{log}p\left(x\right)dx=-\frac{1}{2}\left(1+\mathrm{log}2\pi {\sigma }_{1}^{2}\right),$

so I restrict myself to $$∫𝑝(𝑥)log𝑞(𝑥)𝑑𝑥∫p(x)log⁡q(x)dx\int p(x) \log q(x) dx$$, which I can write out as



$-\int p\left(x\right)\mathrm{log}\frac{1}{\left(2\pi {\sigma }_{2}^{2}{\right)}^{\left(1/2\right)}}{e}^{-\frac{\left(x-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}}dx,$

which can be separated into



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)-\int p\left(x\right)\mathrm{log}{e}^{-\frac{\left(x-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}}dx.$

Taking the log I get



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)-\int p\left(x\right)\left(-\frac{\left(x-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}\right)dx,$

where I separate the sums and get $$𝜎22σ22\sigma_2^2$$ out of the integral.



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)+\frac{\int p\left(x\right){x}^{2}dx-\int p\left(x\right)2x{\mu }_{2}dx+\int p\left(x\right){\mu }_{2}^{2}dx}{2{\sigma }_{2}^{2}}$

Letting $$⟨⟩⟨⟩\langle \rangle$$ denote the expectation operator under $$𝑝pp$$, I can rewrite this as



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)+\frac{⟨{x}^{2}⟩-2⟨x⟩{\mu }_{2}+{\mu }_{2}^{2}}{2{\sigma }_{2}^{2}}.$

We know that $$𝑣𝑎𝑟(𝑥)=⟨𝑥2⟩−⟨𝑥⟩2var(x)=⟨x2⟩−⟨x⟩2var(x) = \langle x^2 \rangle - \langle x \rangle ^2$$. Thus



$⟨{x}^{2}⟩={\sigma }_{1}^{2}+{\mu }_{1}^{2}$

and therefore



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }^{2}\right)+\frac{{\sigma }_{1}^{2}+{\mu }_{1}^{2}-2{\mu }_{1}{\mu }_{2}+{\mu }_{2}^{2}}{2{\sigma }_{2}^{2}},$

which I can put as



$\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)+\frac{{\sigma }_{1}^{2}+\left({\mu }_{1}-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}.$

Putting everything together, I get to



$\begin{array}{rl}KL\left(p,q\right)& =-\int p\left(x\right)\mathrm{log}q\left(x\right)dx+\int p\left(x\right)\mathrm{log}p\left(x\right)dx\\ \\ & =\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)+\frac{{\sigma }_{1}^{2}+\left({\mu }_{1}-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}-\frac{1}{2}\left(1+\mathrm{log}2\pi {\sigma }_{1}^{2}\right)\\ \\ & =\mathrm{log}\frac{{\sigma }_{2}}{{\sigma }_{1}}+\frac{{\sigma }_{1}^{2}+\left({\mu }_{1}-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}.\end{array}$
Which is wrong since it equals $$111$$ for two identical Gaussians.

Can anyone spot my error?

Update

Thanks to mpiktas for clearing things up. The correct answer is:

$$𝐾𝐿(𝑝,𝑞)=log𝜎2𝜎1+𝜎21+(𝜇1−𝜇2)22𝜎22−12KL(p,q)=log⁡σ2σ1+σ12+(μ1−μ2)22σ22−12KL(p, q) = \log \frac{\sigma_2}{\sigma_1} + \frac{\sigma_1^2 + (\mu_1 - \mu_2)^2}{2 \sigma_2^2} - \frac{1}{2}$$

• sorry for posting the incorrect answer in the first place. I just looked at $x-{\mu }_{1}$$x-\mu_1$ and immediately thought that the integral is zero. The point that it was squared completely missed my mind :) – mpiktas Feb 21 '11 at 12:02
• what about the multi variate case? – user7001 Oct 23 '11 at 0:49
• I have just seen in a research paper that kld should be \$KL(p, q) = ½ * ((μ₁-μ₂)² + σ₁²+σ₂²) * ( (1/σ₁²) + (1/σ₂²) ) - 2 – skyde Aug 1 '13 at 14:26
• I think there is a typo in your question, since I cannot validate it and it also seems that you used the correct version later in your question:
$\int p\left(x\right)\mathrm{log}p\left(x\right)dx=\frac{1}{2}\left(1+\mathrm{log}2\pi {\sigma }_{1}^{2}\right)$
I think it should be (note the minus):
$\int p\left(x\right)\mathrm{log}p\left(x\right)dx=-\frac{1}{2}\left(1+\mathrm{log}2\pi {\sigma }_{1}^{2}\right)$
I tried to edit your question and got banned for it, so maybe do it yourself.
– y-spreen Jan 25 '18 at 13:49
• The answer is also in my 1996 paper on Intrinsic losses. – Xi'an Mar 29 '18 at 20:27

OK, my bad. The error is in the last equation:

$\begin{array}{rl}KL\left(p,q\right)& =-\int p\left(x\right)\mathrm{log}q\left(x\right)dx+\int p\left(x\right)\mathrm{log}p\left(x\right)dx\\ \\ & =\frac{1}{2}\mathrm{log}\left(2\pi {\sigma }_{2}^{2}\right)+\frac{{\sigma }_{1}^{2}+\left({\mu }_{1}-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}-\frac{1}{2}\left(1+\mathrm{log}2\pi {\sigma }_{1}^{2}\right)\\ \\ & =\mathrm{log}\frac{{\sigma }_{2}}{{\sigma }_{1}}+\frac{{\sigma }_{1}^{2}+\left({\mu }_{1}-{\mu }_{2}{\right)}^{2}}{2{\sigma }_{2}^{2}}-\frac{1}{2}\end{array}$

Note the missing $-\frac{1}{2}$$-\frac{1}{2}$. The last line becomes zero when ${\mu }_{1}={\mu }_{2}$$\mu_1=\mu_2$ and ${\sigma }_{1}={\sigma }_{2}$$\sigma_1=\sigma_2$.

• @mpiktas I meant the question really - bayerj Is a well published researcher and I'm an undergrad. Nice to see that even the smart guys fall back to asking on the internet sometimes :) – N. McA. Apr 5 '16 at 10:19
• is p ${\mu }_{1}{\sigma }_{1}$$\mu_1 \sigma_1$ or ${\mu }_{2}{\sigma }_{2}$$\mu_2 \sigma_2$ – Kong Jan 20 '18 at 23:41

I did not have a look at your calculation but here is mine with a lot of details. Suppose $$𝑝pp$$ is the density of a normal random variable with mean $$𝜇1μ1\mu_1$$ and variance $$𝜎21σ12\sigma^2_1$$, and that $$𝑞qq$$ is the density of a normal random variable with mean $$𝜇2μ2\mu_2$$ and variance $$𝜎22σ22\sigma^2_2$$. The Kullback-Leibler distance from $$𝑞qq$$ to $$𝑝pp$$ is:

$$∫[log(𝑝(𝑥))−𝑙𝑜𝑔(𝑞(𝑥))]𝑝(𝑥)𝑑𝑥∫[log⁡(p(x))−log(q(x))]p(x)dx\int \left[\log( p(x)) - log( q(x)) \right] p(x) dx$$

$$=∫[−12log(2𝜋)−log(𝜎1)−12(𝑥−𝜇1𝜎1)2+12log(2𝜋)+log(𝜎2)+12(𝑥−𝜇2𝜎2)2]=∫[−12log⁡(2π)−log⁡(σ1)−12(x−μ1σ1)2+12log⁡(2π)+log⁡(σ2)+12(x−μ2σ2)2]=\int \left[ -\frac{1}{2} \log(2\pi) - \log(\sigma_1) - \frac{1}{2} \left(\frac{x-\mu_1}{\sigma_1}\right)^2 + \frac{1}{2}\log(2\pi) + \log(\sigma_2) + \frac{1}{2} \left(\frac{x-\mu_2}{\sigma_2}\right)^2 \right]$$ $$×12𝜋√𝜎1exp[−12(𝑥−𝜇1𝜎1)2]𝑑𝑥×12πσ1exp⁡[−12(x−μ1σ1)2]dx\times \frac{1}{\sqrt{2\pi}\sigma_1} \exp\left[-\frac{1}{2}\left(\frac{x-\mu_1}{\sigma_1}\right)^2\right] dx$$

$$=∫{log(𝜎2𝜎1)+12[(𝑥−𝜇2𝜎2)2−(𝑥−𝜇1𝜎1)2]}=∫{log⁡(σ2σ1)+12[(x−μ2σ2)2−(x−μ1σ1)2]}=\int \left\{\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2} \left[ \left(\frac{x-\mu_2}{\sigma_2}\right)^2 - \left(\frac{x-\mu_1}{\sigma_1}\right)^2 \right] \right\}$$ $$×12𝜋√𝜎1exp[−12(𝑥−𝜇1𝜎1)2]𝑑𝑥×12πσ1exp⁡[−12(x−μ1σ1)2]dx\times \frac{1}{\sqrt{2\pi}\sigma_1} \exp\left[-\frac{1}{2}\left(\frac{x-\mu_1}{\sigma_1}\right)^2\right] dx$$

$$=𝐸1{log(𝜎2𝜎1)+12[(𝑥−𝜇2𝜎2)2−(𝑥−𝜇1𝜎1)2]}=E1{log⁡(σ2σ1)+12[(x−μ2σ2)2−(x−μ1σ1)2]}=E_{1} \left\{\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2} \left[ \left(\frac{x-\mu_2}{\sigma_2}\right)^2 - \left(\frac{x-\mu_1}{\sigma_1}\right)^2 \right]\right\}$$

$$=log(𝜎2𝜎1)+12𝜎22𝐸1{(𝑋−𝜇2)2}−12𝜎21𝐸1{(𝑋−𝜇1)2}=log⁡(σ2σ1)+12σ22E1{(X−μ2)2}−12σ12E1{(X−μ1)2}=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2\sigma_2^2} E_1 \left\{(X-\mu_2)^2\right\} - \frac{1}{2\sigma_1^2} E_1 \left\{(X-\mu_1)^2\right\}$$

$$=log(𝜎2𝜎1)+12𝜎22𝐸1{(𝑋−𝜇2)2}−12=log⁡(σ2σ1)+12σ22E1{(X−μ2)2}−12=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2\sigma_2^2} E_1 \left\{(X-\mu_2)^2\right\} - \frac{1}{2}$$

(Now note that $$(𝑋−𝜇2)2=(𝑋−𝜇1+𝜇1−𝜇2)2=(𝑋−𝜇1)2+2(𝑋−𝜇1)(𝜇1−𝜇2)+(𝜇1−𝜇2)2(X−μ2)2=(X−μ1+μ1−μ2)2=(X−μ1)2+2(X−μ1)(μ1−μ2)+(μ1−μ2)2(X - \mu_2)^2 = (X-\mu_1+\mu_1-\mu_2)^2 = (X-\mu_1)^2 + 2(X-\mu_1)(\mu_1-\mu_2) + (\mu_1-\mu_2)^2$$)

$$=log(𝜎2𝜎1)+12𝜎22[𝐸1{(𝑋−𝜇1)2}+2(𝜇1−𝜇2)𝐸1{𝑋−𝜇1}+(𝜇1−𝜇2)2]−12=log⁡(σ2σ1)+12σ22[E1{(X−μ1)2}+2(μ1−μ2)E1{X−μ1}+(μ1−μ2)2]−12=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{1}{2\sigma_2^2} \left[E_1\left\{(X-\mu_1)^2\right\} + 2(\mu_1-\mu_2)E_1\left\{X-\mu_1\right\} + (\mu_1-\mu_2)^2\right] - \frac{1}{2}$$

$$=log(𝜎2𝜎1)+𝜎21+(𝜇1−𝜇2)22𝜎22−12=log⁡(σ2σ1)+σ12+(μ1−μ2)22σ22−12=\log\left(\frac{\sigma_2}{\sigma_1}\right) + \frac{\sigma_1^2 + (\mu_1-\mu_2)^2}{2\sigma_2^2} - \frac{1}{2}$$

## protected by kjetil b halvorsenNov 10 '18 at 21:59

Thank you for your interest in this question. Because it has attracted low-quality or spam answers that had to be removed, posting an answer now requires 10 reputation on this site (the association bonus does not count).