Hacker Newsnew | past | comments | ask | show | jobs | submit | -_-'s commentslogin

Author here! 1a. LLMs fundamentally model probability distributions of token sequences—those are the (normalized) logits from the last linear layer of a transformer. The closest thing to ablating temperature is T=0 or T=1 sampling. 1b. Yes, you can do something like this, for instance by picking the temperature where perplexity is minimized. Perplexity is the exponential of entropy, to continue the thermodynamic analogy. 1c. Higher than for most AI written text, around 1.7. I've experimented with this as a metric for distinguishing whether text is written by AI. Human-written text doesn't follow a constant-temperature softmax distribution, either.

2b. Giving an LLM control over its own sampling parameters sounds like it would be a fun experiment! It could have dynamic control to write more creatively or avoid making simple mistakes. 2c. This would produce nonsense. The tokens you get with negative temperature sampling are "worse than random"


> . I've experimented with this as a metric for distinguishing whether text is written by AI. Human-written text doesn't follow a constant-temperature softmax distribution, either.

oo that sounds like a cool insight. like just do a trailing 20-30 token average of estimated temperature and look for variance like one might do a VO2 max


What model did you use? I ran this with the original Llama 13B. The newer Llama models use a different tokenizer that will have its own anomalous tokens.

Yep! Very large negative temperatures and very large positive temperatures have essentially the same distribution. This is clearer if you consider thermodynamic beta, where T = ±∞ corresponds to β = 0.

That's the premise behind Workshop Labs! https://workshoplabs.ai


I’ve also noticed recently that when I click a Twitter link from Telegram, it hijacks the Telegram webview to open the tweet in Safari.



If true, bad news for Elon Musk and xAI because they have to start over. He's already indicated this in regards to Wikipedia. He wants to train on Grokepedia and not Wikipedia. Removing NSFW material gives him another reason.


Yes! At https://RunRL.com we offer hosted RL fine-tuning, so all you need to provide is a dataset and reward function or environment.


To add to this, you can currently manually parse tool calls in your environment's step function, but we'll be rolling out a UI that makes this easier soon.


ART is also great, though since it's built on top of Unsloth it's geared towards single GPU QLoRA training. We use 8 H100s as a standard, so we can handle larger models and full-parameter fine-tunes.


Interesting, do you have benchmarks on FFT vs QLoRA for RL?


we should publish some; the high-order effect seems to be that LoRAs significantly hurt small model performance vs FFT, with less of an effect for large models. This is maybe because large models have more built-in skills and thus a LoRA suffices to elicit the existing skill, whereas for small models you need to do more actual learning (holding # parameter updates constant). In general I think it's better to get a performant small model with FFT than a performant large model with a large LoRA, which is why we default to FFT, but I agree that we should publish more details here.


Thanks! Personally I found FFT is not necessarily a strict improvement over (Q)LoRA as it can sometimes more easily lead to instability in the model, hence the bit of extra scrutiny.

Curious to see your thoughts and results whenever you get something out.


Have you heard of https://puffer.ai? Might fit your use case


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: