I wondered about the connection to actor-critic too; GANs have taken inspiration from RL but so far they haven't given anything back, and offhand, I don't know of anything like clipping in actor-critic, but my thought was that it was the *critic* which should be clipped, not the actor. The critic seems exactly analogous to the discriminator in GAN, as it tries to judge the quality of the action taken by the generator (image emitted). So perhaps the key experiment here would be to add clipping to critic weights and see if it reduces the variance and the system as a whole learns faster?
I also wonder about the scale; with WGAN, the Wasserstein distance and
losses can change dramatically depending on the exact model structure,
and you seem to need to adjust the learning rate drastically (is that
the implication of your mention of the constant being buried in alpha?
I've mentioned this elsewhere that WGAN seems to need aggressive
tweaking of the learning rate, but so far no one else has mentioned it). One of the key ingredients is letting the loss vary over a wider range rather than logging it or whatever; what might the equivalent be for actor-critic?