We were unable to load Disqus. If you are a moderator please see our troubleshooting guide.

Join the discussion…

• ##### in this conversation
GIF

• Thanks a lot for your impressive introduction!

I am a little confused by the derivation between "Let’s try changing to sup inf" and "We see that the infimum is concave, as required. Because all functions ....".

Could you explain more about why $inf_gamma E_{x,y~gamma}[|x-y|-(f(x)-f(y)]$ is negative infinity when $f$ is not 1-Lipschitz? Thank you!

• Those visualizations are beautiful! congrats on this extremely nice intro!
The great news is that, despite what the bibliography of the paper seems to suggest, there are several ML papers that have been proposed that fit between Villani and the Wasserstein GAN paper.

I am happy to plug mine, notably at nips 2016:
https://papers.nips.cc/pape...
https://papers.nips.cc/pape...
and nips 2013:
https://papers.nips.cc/pape...

but there are also many other interesting projects going on that are worth mentioning, and I am only mentioning pure ML/stats papers.
http://cbcl.mit.edu/wassers...
https://arxiv.org/abs/1507....
https://arxiv.org/pdf/1701....
https://papers.nips.cc/pape...

• I think there was a small typo at the Strong Duality section in the second display-mode equation (i.e. document.getElementById('MathJax-Element-115-Frame')).

In particular, the [b, -z* + epsilon] should be transposed:

\begin{bmatrix} \mathbf{A} \\ -\mathbf{c}^T \end{bmatrix}^T
\begin{bmatrix} \mathbf{y} \\ \alpha \end{bmatrix} \leq \mathbf{0}, \ \ \ \
\begin{bmatrix} \mathbf{b} \\ -z^* + \epsilon \end{bmatrix}^T
\begin{bmatrix} \mathbf{y} \\ \alpha \end{bmatrix} > 0

Also, thanks for the great article.

• Thanks for this very nice into! Probably the best Wasserstein explanation I've seen :)
May I ask you what tools you have used other than jupyter (+ matplotlib) to create your diagrams?

• Thank you, Daniel!
I used Adobe Illustrator to edit the plots (Inkscape is a great free alternative) and GeoGebra for the Farkas diagrams.

• Hello Vincent Herrmann ! It seems that your Mathjax's CDN has been down so all the formulas are in a mess to me. Could you please find some time to fix it ?

• Hello,

I was wondering where you got the functions f and g for the dual problem?

• thanks for writing this up. I have a question on the last part. How do you show that the additional term is 0 if gamma in pi and infinity otherwise? The part when you try to make gamma independent of pi (the joint distribution)

• If gamma in pi, then $E_(x,y~gamma)[f(x) - f(y)]$ and $E_(s~p_r)[f(s)] - E_(r~p_theta)[f(r)]$ are the same, so their difference is always zero, no matter what f we choose. If gamma not in pi, then s and x, or r and y are completely independent and we can choose an f that makes the difference +inf. Does this help?

• https://github.com/maestroj...

Hi, I implemented Approximation EMD implemented with neural network, and gan and wgan implementation on simple and well-known example.

• Hello
I really was impressed about your article and it really helped me understand kantrovich duality
You going through your proof by introducing 'sup' term into original equation and
swap 'inf' and 'sup' part by showing minimax principle
What i could not get on is that why convexity is met for 'sup'g(a,b) and 'inf'g(a,b) is concave
Can you explain it more detail if you possible?

• Hi,
I wasn't careful enough when if blended two different proofs here. The terms convex and concave are wrong in this case. What I actually mean is that every local minimum in $a$ for $sup_b g(a, b)$ is also a global minimum, and every local maximum in $b$ for $inf_a g(a, b)$ is also a global maximum. There is another mistake, if we use expectations of the random variable gamma instead of integration over the ordinary variable gamma, then we won't get to $-inf$ if $||f||_L > 1$, but only some value less than 0. The proof still works the same. Does this help?
I will change this part as soon as I can.Thank you for calling my attention to this issue!

I think the term 'convex and concave' is right since every local minimum of convex function
is global minimum
Thanks really helps me a lot
But i still can not understand 'inf' ove gamma part of last equation.
Why it is 0 when f is 1-Lipschtz function?

• I'm no expert in convexity, but since there could be many separate global minima, I don't know if we can really call it convex. But it's definitely something very similar.

The definition of the 1-Lipschitz condition is basically |f(x) - f(y)| <= |x-y|, the difference between two values of f cannot be greater than their distance. Then |x-y| - (f(x) - f(y)) >= 0, for all x and y. We get the inf over gamma for example if we choose a probability of 1 at the same single value for x and y. The expectation is then 0.

• Thanks now i understands
So now how i understand is that at first you wanna show 'inf''sup' term is changeable to 'sup''inf term.
That`s why you added term sup over 'f' which is every local minimum is a global minimum. So changing 'inf''sup' term to 'sup'inf term can be done.
Next, inf over gamma term achieves 0 in that only 1-Lipschitz f function gives feasible solution.
Am i understanding right?

• Yeah, I think that's right.
The idea is to first replace the constraint on gamma (gamma \in pi) with an optimization over a function f, then swap the order of inf_gamma and sup_f, and lastly replace the optimization over gamma with an constraint on f (f is 1-Lipschitz continuous).

So we want to change from inf_gamma sup_f to sup_f inf_gamma. This is only possible if both sup_f is "convex" (or whatever we call it) in gamma, and inf_gamma is "concave" in f. We show that this condition is sufficient in an abstract way in the g(a,b) part. In our case the whole function (all the expectations, but without the inf and sup) is g(gamma, f), and it is relatively easy to see that the conditions are fulfilled.

• http://nbviewer.jupyter.org...
This is how I worked on. It is really hard to understand in continuous case. Almost I gave up and recommended your blogs in the end haha.

• I appreciate that you stuck with it! Yeah, the bilevel optimization and convex functions are tricky, but I think it's great how quickly you can prove it this way.

• Uhmm, it is hard to get the proof of strong duality. But I did it. Maybe you have more time, you rather add some explanation on it

• Thanks for all the corrections! Sorry if the errors confused you...
I agree that some of it is quite difficult to grasp. There is always this temptation to make things look easy by making the explanation short. But of course, most of the the time, this just means that the reader has to do everything himself. I will try to think of something to clear it up a bit.

• At weiestrass distance, not dx, y but dx dy

• I found one more error \sum f_i + g_i instead of \sum_i f_i - g_i

• \alpha z^*= \alpha c^Tx^* \geq y^TAx^*=y^Tb>\alpha(z^*-\epsilon)
You'd better use this equation to show \alpha>0

• actually \widetilde{z} > z* - \epsilon, not equal.

• What I meant is that $\tilde{z} := z* - \epsilon$ is possible. All we want is this one value $\tilde{z}$ that gets close to $z^{*}$, the weak duality takes care of the rest.
I changed the equals sign to a defined sign, maybe this is clearer.

• Hi. May I translate this into Chinese?

• MathJax rendering is not rendering inline math. You might want to change the cdn... see this link: https://www.mathjax.org/cdn...

• Hey may I translate these into Korean?