[R] Unraveling the Mysteries: Why is AdamW Often Superior to Adam+L2 in Practice? Hello, ML enthusiasts! 🚀🤖 We analyzed rotational equilibria in our latest work, ROTATIONAL EQUILIBRIUM: HOW WEIGHT DECAY BALANCES LEARNING ACROSS NEURAL NETWORKS...