Hacker Newsnew | past | comments | ask | show | jobs | submit | julien_c's commentslogin

[i work at Hugging Face]

BTW the reporting in this article is sloppy/incorrect


Flamingo-style, see for instance the recently released IDEFICS: https://huggingface.co/blog/idefics


we have a secret plan to change our logo to a Face Hugger some time in the future :)


that would make for a quite different branding.

Might think about it:)


for literally years I thought it was a tongue in cheek thing. The world to is now duller for the truth hath come and we found ourwelves wanting a grander fiction.


quite cool! haven't tried it yet, but what's the latency on hot-loading a model? (for instance, loading `stabilityai/stable-diffusion-2-1` for the first API call)


Because this is popular model and many people use it, you will not experience the cold-start latency most likely. But in general it is <10s.


what's the inference time on M1?


Running the "alien life" example from the README took 30 seconds on my M1 Max. I don't think it uses the GPU at all.

I couldn't get the "mega" option to work, I got an error "TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float32, float16" (looks like a known issue https://github.com/kuprel/min-dalle/issues/2)

Edit: installing flax 0.4.2 fixes this issue, thank all!


The thread now has a fix. As for the GPU, it's possible to get it working with some extra steps https://github.com/google/jax/issues/8074

Macbook Pro M1 Pro numbers (CPU):

    python3 image_from_text.py --text='court sketch of godzilla on trial' --mega   640.24s user 179.30s system 544% cpu 2:30.39 total


From reading that thread it didn't sound like GPU was fully supported yet, were you able to get it working?


Pretty much identical on M1 Max

python3 image_from_text.py --text='a comfy chair that looks like an avocado' 612.30s user 180.72s system 552% cpu 2:23.52 total


Thanks for catching this. I just updated it so that it should work with the latest flax.


This appears to have been fixed moments ago:

https://github.com/kuprel/min-dalle/commit/38ebe54a382f36dc7...


Change the flax version to 0.4.2 (currently 0.5.2) will work

So much for semver :(


0.y.z is kind of an "all bets are off" situation in semver: https://semver.org/#spec-item-4

> Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

Variant schemes like the one respected by Cargo (https://doc.rust-lang.org/cargo/reference/semver.html) aren't usually much different.

> Initial development releases starting with "0.y.z" can treat changes in "y" as a major release, and "z" as a minor release.



What is the/is there any reproducible measurement for benchmarking a nlp dataset/application. i.e. in paper it mentions:

'Comparing T0 and GPT-3’s robustness Because Brown et al. (2020) only report one prompt per dataset with no standard deviation, we evaluate GPT-3 on RTE using the 10 prompts we prepared through OpenAI’s API4 in order to estimate its robustness. Note that one of our templates is identical to Brown et al. (2020, p. 59)’s reported prompt; this prompt scores 58.8% accuracy on the API “Base” series which is lower than the reported accuracy of 63.5% from Brown et al. (2020). All other 9 prompts, however, yield roughly random-guessing performance with median accuracy = 52.96% and interquartile range = 1.28%. These results suggest that T0 is more robust to prompt formulation than GPT-3.'


Yes there are many reproducible measures for benchmarking NLP datasets. We use many of them in the paper.

The issue here is that we were not completely sure of the process that OpenAI used in their paper. They report the prompt but not the process of finding it. As their model and process is proprietary, it is hard for us to do an apples-to-apples comparison. This small experiment though indicates that it is likely not very robust to prompt wording.


FWIW this is extractive question answering so the output of the model can only be a span of the input paragraph.

SQuAD is the prototypical example of a dataset for this task, see https://rajpurkar.github.io/SQuAD-explorer/


The problem is it fails even simplest tests (Q: "Adam was not born in 2012. When was Adam born?" A: "2012"; "Mary says Adam was born in 2012. John says Adam was born in 2013. And in fact the latter date is correct." A: "2012").


For France, you can check the French Tech Visa program: https://lafrenchtech.com/en/how-france-helps-startups/french...

I've done it for one of my team members, it's pretty easy.


Not really on unsupervised/self-supervised data though, right?

(nor on the same scale of corpora, as far as I can tell)


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: