for literally years I thought it was a tongue in cheek thing. The world to is now duller for the truth hath come and we found ourwelves wanting a grander fiction.
quite cool! haven't tried it yet, but what's the latency on hot-loading a model? (for instance, loading `stabilityai/stable-diffusion-2-1` for the first API call)
Running the "alien life" example from the README took 30 seconds on my M1 Max. I don't think it uses the GPU at all.
I couldn't get the "mega" option to work, I got an error "TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float32, float16" (looks like a known issue https://github.com/kuprel/min-dalle/issues/2)
Edit: installing flax 0.4.2 fixes this issue, thank all!
What is the/is there any reproducible measurement for benchmarking a nlp dataset/application. i.e. in paper it mentions:
'Comparing T0 and GPT-3’s robustness Because Brown et al. (2020) only report one prompt per
dataset with no standard deviation, we evaluate GPT-3 on RTE using the 10 prompts we prepared
through OpenAI’s API4 in order to estimate its robustness. Note that one of our templates is identical
to Brown et al. (2020, p. 59)’s reported prompt; this prompt scores 58.8% accuracy on the API
“Base” series which is lower than the reported accuracy of 63.5% from Brown et al. (2020). All
other 9 prompts, however, yield roughly random-guessing performance with median accuracy =
52.96% and interquartile range = 1.28%. These results suggest that T0 is more robust to prompt
formulation than GPT-3.'
Yes there are many reproducible measures for benchmarking NLP datasets. We use many of them in the paper.
The issue here is that we were not completely sure of the process that OpenAI used in their paper. They report the prompt but not the process of finding it. As their model and process is proprietary, it is hard for us to do an apples-to-apples comparison. This small experiment though indicates that it is likely not very robust to prompt wording.
The problem is it fails even simplest tests (Q: "Adam was not born in 2012. When was Adam born?" A: "2012"; "Mary says Adam was born in 2012. John says Adam was born in 2013. And in fact the latter date is correct." A: "2012").
BTW the reporting in this article is sloppy/incorrect