More

julien_c · on Nov 23, 2023

[i work at Hugging Face]

BTW the reporting in this article is sloppy/incorrect

julien_c · on Sept 5, 2023

Flamingo-style, see for instance the recently released IDEFICS: https://huggingface.co/blog/idefics

julien_c · on Aug 24, 2023

we have a secret plan to change our logo to a Face Hugger some time in the future :)

julien_c · on Feb 22, 2023

that would make for a quite different branding.

Might think about it:)

RosanaAnaDana · on Feb 22, 2023

for literally years I thought it was a tongue in cheek thing. The world to is now duller for the truth hath come and we found ourwelves wanting a grander fiction.

julien_c · on Feb 22, 2023

quite cool! haven't tried it yet, but what's the latency on hot-loading a model? (for instance, loading `stabilityai/stable-diffusion-2-1` for the first API call)

nikola_borisof · on Feb 24, 2023

Because this is popular model and many people use it, you will not experience the cold-start latency most likely. But in general it is <10s.

julien_c · on June 28, 2022

what's the inference time on M1?

tomduncalf · on June 28, 2022

Running the "alien life" example from the README took 30 seconds on my M1 Max. I don't think it uses the GPU at all.

I couldn't get the "mega" option to work, I got an error "TypeError: lax.dynamic_update_slice requires arguments to have the same dtypes, got float32, float16" (looks like a known issue https://github.com/kuprel/min-dalle/issues/2)

Edit: installing flax 0.4.2 fixes this issue, thank all!

klohto · on June 28, 2022

The thread now has a fix. As for the GPU, it's possible to get it working with some extra steps https://github.com/google/jax/issues/8074

Macbook Pro M1 Pro numbers (CPU):

    python3 image_from_text.py --text='court sketch of godzilla on trial' --mega   640.24s user 179.30s system 544% cpu 2:30.39 total

tomduncalf · on June 28, 2022

From reading that thread it didn't sound like GPU was fully supported yet, were you able to get it working?

tomduncalf · on June 28, 2022

Pretty much identical on M1 Max

python3 image_from_text.py --text='a comfy chair that looks like an avocado' 612.30s user 180.72s system 552% cpu 2:23.52 total

kuprel · on June 28, 2022

Thanks for catching this. I just updated it so that it should work with the latest flax.

lloeki · on June 28, 2022

This appears to have been fixed moments ago:

https://github.com/kuprel/min-dalle/commit/38ebe54a382f36dc7...

NhanH · on June 28, 2022

Change the flax version to 0.4.2 (currently 0.5.2) will work

So much for semver :(

Twisol · on June 28, 2022

0.y.z is kind of an "all bets are off" situation in semver: https://semver.org/#spec-item-4

> Major version zero (0.y.z) is for initial development. Anything MAY change at any time. The public API SHOULD NOT be considered stable.

Variant schemes like the one respected by Cargo (https://doc.rust-lang.org/cargo/reference/semver.html) aren't usually much different.

> Initial development releases starting with "0.y.z" can treat changes in "y" as a major release, and "z" as a minor release.

julien_c · on Oct 18, 2021

ArXiv link to the paper: https://arxiv.org/abs/2110.08207

GitHub repo: https://github.com/bigscience-workshop/promptsource

hrgiger · on Oct 18, 2021

What is the/is there any reproducible measurement for benchmarking a nlp dataset/application. i.e. in paper it mentions:

'Comparing T0 and GPT-3’s robustness Because Brown et al. (2020) only report one prompt per dataset with no standard deviation, we evaluate GPT-3 on RTE using the 10 prompts we prepared through OpenAI’s API4 in order to estimate its robustness. Note that one of our templates is identical to Brown et al. (2020, p. 59)’s reported prompt; this prompt scores 58.8% accuracy on the API “Base” series which is lower than the reported accuracy of 63.5% from Brown et al. (2020). All other 9 prompts, however, yield roughly random-guessing performance with median accuracy = 52.96% and interquartile range = 1.28%. These results suggest that T0 is more robust to prompt formulation than GPT-3.'

srush · on Oct 18, 2021

Yes there are many reproducible measures for benchmarking NLP datasets. We use many of them in the paper.

The issue here is that we were not completely sure of the process that OpenAI used in their paper. They report the prompt but not the process of finding it. As their model and process is proprietary, it is hard for us to do an apples-to-apples comparison. This small experiment though indicates that it is likely not very robust to prompt wording.

julien_c · on Sept 10, 2021

FWIW this is extractive question answering so the output of the model can only be a span of the input paragraph.

SQuAD is the prototypical example of a dataset for this task, see https://rajpurkar.github.io/SQuAD-explorer/

hdjjhhvvhga · on Sept 10, 2021

The problem is it fails even simplest tests (Q: "Adam was not born in 2012. When was Adam born?" A: "2012"; "Mary says Adam was born in 2012. John says Adam was born in 2013. And in fact the latter date is correct." A: "2012").

julien_c · on July 24, 2020

For France, you can check the French Tech Visa program: https://lafrenchtech.com/en/how-france-helps-startups/french...

I've done it for one of my team members, it's pretty easy.

julien_c · on June 12, 2020

Not really on unsupervised/self-supervised data though, right?

(nor on the same scale of corpora, as far as I can tell)