The intermediate goal is to have some standardized testing dataset of a couple of hundred megabytes to a gigabyte or so.
I think you should post a ToDo list on the git repo. People can then contribute their skills.