In addition to what the other commenter said about Moores law, innovations like ...

In addition to what the other commenter said about Moores law, innovations like Flash Attention which reduced memory usage by over 10x and FA 2 which made huge leaps in compute efficiency show there is still a lot of room to improve the models and inference algorithms themselves. Even without compute we likely haven’t scratched the surface of efficient transformers.