You can build a neural network model of the whole device, but you have to collect a lot more data. Results depend on the number of parameter permutations you sample and on the independence of the controls. I trained a NN on an expensive analog EQ with millions of permutations, but because the controls are largely linear and independent, it can extrapolate well to unseen combinations.
You can also use differentiable DSP to obtain a decent “first pass” approximation using traditional methods, and then rely on the NN to make up the difference afterward. This dramatically reduces the NN parameter count and speeds up computation. I recently tried this on an analog compressor (LA2A) and got 95%+ of the way there with a very small model.
Sure, but you still have an infinite number of possible inputs, your model will only show that on certain note combinations at just the right volumes you get sub harmonics through intermodulation distortion if you find those note combinations when you build it. Musicians are very good at finding and exploiting these quirks of equipment, they are what ultimately define the characteristic sound of given bits of gear (and the musician's) and are what companies in the modeling business aim for, they get a dozen or so of those characteristic sounds and fudge the rest through a mix of techniques like interpolaitng the difference. But you can not get them all and the only people who end up buying the tech are the ones that primarily want those stereotypical sounds and that is getting to be a very crowded market.
If someone really wanted to use such technologies to be a game changer they would forget about the past and use it to design something new that exploits its strengths in a way that is natural to the musician instead of showing off its weaknesses. The potential of the technology is quite amazing and yet everyone uses it to chase nostalgia.
You can also use differentiable DSP to obtain a decent “first pass” approximation using traditional methods, and then rely on the NN to make up the difference afterward. This dramatically reduces the NN parameter count and speeds up computation. I recently tried this on an analog compressor (LA2A) and got 95%+ of the way there with a very small model.