You don't need to create a separate tokenizer or bloat the model in order to ensure that system embeddings are "colored" differently in the model; you can simply reserve a bit in the input vector (when you're concatenating e.g. token embeddings and positional embeddings, just have one explicit element/"neuron" in the positional embeddings dedicated to a flag whether that token came from "system" or "user"; and the only thing that complicates the training is that you do need some training examples to require opposite treatment of the same orders depending on that flag.
If that's possible, will it be also possible to characterize/model how parameters dissolve into a weight and "forward-pass" analytically construct LLM/DNN models?
The above post is about ensuring that the markings given to the model along with the text about the prompt/data distinction are "out-of-band", reliable, and can't be influenced or faked by user-controlled data. Having the model actually act in accordance to the prompt is a wholly different issue; but at least this discussion seems to assume that this is mostly solved (e.g. by reinforcement learning from human feedback) and that the main problem is the injection itself.