L1 drops weights to zero, L2 biases towards Gaussianality.
It's not always relearning lessons or people entirely blindly trying things either, many researchers use the underlying math to inform decisions for network optimization. If you're seeing that, then that's probably a side of the field where people are newer to some of the math behind it, and that will change as things get more established.
The underlying mathematics behind these kinds of systems are what has motivated a lot of the improvements in hlb-CIFAR10, for example. I don't think I would have been able to get there without sitting down with the fundamentals, planning, thinking, and working a lot, and then executing. There is a good place for blind empirical research too, but it loses its utility past a certain point of overuse.
It's not always relearning lessons or people entirely blindly trying things either, many researchers use the underlying math to inform decisions for network optimization. If you're seeing that, then that's probably a side of the field where people are newer to some of the math behind it, and that will change as things get more established.
The underlying mathematics behind these kinds of systems are what has motivated a lot of the improvements in hlb-CIFAR10, for example. I don't think I would have been able to get there without sitting down with the fundamentals, planning, thinking, and working a lot, and then executing. There is a good place for blind empirical research too, but it loses its utility past a certain point of overuse.