Correct, you can only input up to 2048 tokens total (this is a big improvement over GPT-2's 1024 input size). You can use sliding windows to continue generating beyond that.
However, model training scales quadratically as input size increases which makes building larger models more difficult (which is why Reformer is trying workarounds to increase the input size).
However, model training scales quadratically as input size increases which makes building larger models more difficult (which is why Reformer is trying workarounds to increase the input size).