Pure speculation from my side, but I'd think that the advantages over traditional big register banks and on-chip caches are not that great, especially when you're writing 'cache-aware code'. You also need to consider that the PS3 was full of design compromises to keep cost down, e.g. there simply might not have been enough die space for a cache controller for each SPU, or the die space was more vaulable to get a few more kilobytes of static scratch memory instead of the cache logic.
Also, AFAIK on some GPU architectures you have something similar like per-core static scratch space, that's where restrictions are coming from that uniform data per shader invocation may at most be 64 KBytes on some GPU architectures, etc...
Pure speculation from my side, but I'd think that the advantages over traditional big register banks and on-chip caches are not that great, especially when you're writing 'cache-aware code'. You also need to consider that the PS3 was full of design compromises to keep cost down, e.g. there simply might not have been enough die space for a cache controller for each SPU, or the die space was more vaulable to get a few more kilobytes of static scratch memory instead of the cache logic.
Also, AFAIK on some GPU architectures you have something similar like per-core static scratch space, that's where restrictions are coming from that uniform data per shader invocation may at most be 64 KBytes on some GPU architectures, etc...