More

coalteddy · 2025-09-06T01:12:24 1757121144

Very cool. Love this. Was the training more heavily weighted towards swiss languages and how does the model perform on swiss languages compared to others?

Are there any plans for further models after this one?

lllllm · 2025-09-06T10:31:58 1757154718

The pretraining (so 99% of training) is fully global, in over 1000 languages without special weighting. The posttraining (See section 4 of the paper) had also as many languages as we could get, and did upweight some languages. The posttraining can easily be customized to any other target languages

coalteddy · 2025-06-01T18:47:00 1748803620

I have a friend that works on physically based renderers in the film industry and has also done research in the area. Always love hearing stories and explanations about how things get done in this industry.

What companies are hiring such talent at the moment? Have the AI companies also been hiring rendering engineers for creating training environments?

If you are looking to hire an experienced research and industry rendering engineer i am happy to connect you since my friend is not on social media but has been putting out feelers.

mcoliver · 2025-06-01T22:05:00 1748815500

Have him ping me. Username at Gmail.

coalteddy · 2025-01-30T21:30:18 1738272618

Thanks a lot for this eval!

One question i have regarding evals is, what sampling temperature and/or method do you use? As far as i understand temperature/ method can impact model output alot. Would love to here you're thoughts on how these different settings of the same model can impact output and how to go about evaluating models when its not clear how to use the to their fullest

tadamcz · 2025-01-30T23:00:26 1738278026

Generally, we'll use the API provider's defaults.

For models we run ourselves from the weights, at the moment we'd use vLLM's defaults, but this may warrant more thought and adjustment. Other things being equal, I prefer to use an AI lab's API, with settings as vanilla as possible, so that we essentially defer to them on these judgments. For example, this is why we ran this Mistral model from Mistral's API instead of from the weights.

I believe the `temperature` parameter, for example, has different implementations across architectures/models, so it's not as simple as picking a single temperature number for all models.

However, I'm curious if you have further thoughts on how we should approach this.

By the way, in the log viewer UI, for any model call, you can click on the "API" button to see the payloads that were sent. In this case, you can see that we do not send any values to Mistral for `top_p`, `temperature`, etc.

coalteddy · on Jan 23, 2025

Any blogs or other writing about this topic you can recommend? I worked with gurobi in the past but haven't been keeping up with the trends and performance gains.

Love this field of CS!

coalteddy · on Oct 3, 2024

How do I get access to this feature? I cannot find it in the normal chatgpt interface.

SeanAnderson · on Oct 3, 2024

It's a staged rollout. You'll probably have it by tomorrow morning.

aaronharnly · on Oct 3, 2024

I believe you wait until your number comes up :/

whimsicalism · on Oct 3, 2024

it's under the model list on the web interface

coalteddy · on Aug 27, 2024

Wow this is the first time i hear about such a method. Anywhere i can read up on how the temperature multiplier works and what the implications/effects are? Is it just changing the temperature based on how many tokens have already been processed (i.e. the temperature is variable over the course of a completion spanning many tokens)?

orbital-decay · on Aug 27, 2024

Just a fixed multiplier (say, 0.5) that makes you use half of the range. As I said I'm just speculating. But Sonnet 3.5's temperature definitely feels like it doesn't affect much. The model is overfit and that could be the cause.

coalteddy · on March 31, 2024

This is not true. I have no idea about the US but Canada, specifically BC, still has large amounts of old growth forest that is being cut. Really sad to see and read about.

coalteddy · on March 2, 2024

US cities love their cars. Not even in city centers do they prioritize pedestrians over cars. That has nothing to do with apples or oranges. It's a priority thing and not costs. There is no reason to need cars in city centers. Makes cities ugly, loud and dangerous compared to europe or asia.

coalteddy · on Jan 31, 2024

How did you do this? Was the redaction done by changing the color of the font to white so that the background and text have the same color? Would love to learn how you were able to recover the text.

w-ll · on Jan 31, 2024

SVGs are XML, if you go to the image, you can actually inspect it with developer tools and deleted the blackouts.

https://images.openai.com/blob/047e2a80-8cd3-41b5-acd8-bc822...

dchichkov · on Jan 31, 2024

He had explained, it is SVG. You simply remove these masks from the file or change transparency.

I've prompted ChatGPT to make a bit more detailed explanation: https://chat.openai.com/share/42e55091-18c2-421e-9452-930114...

You can probably prompt it to further to generate python code and unmask the file for you, in the interpreter.

Incidentally, this use of GPT4 is somewhat similar to the threat model that they are studying. I'm a bit surprised that they've used plain GPT-4 for the study, rather than GPT-4 augmented with tools and a large dataset of relevant publications.

JieJie · on Jan 31, 2024

Their reasoning for not using tools or browsing from the "Limitations" section:

"No GPT-4 tool usage: Due to our security measures, the GPT-4 models we tested were used without any tools, such as Advanced Data Analysis and Browsing. Enabling the usage of such tools could non-trivially improve the usefulness of our models in this context. We may explore ways to safely incorporate usage of these tools in the future."

dchichkov · on Feb 1, 2024

Sounds like the Frontier team wasn't able to convince GPTs team to run an extra model.

coalteddy · on Jan 28, 2024

This site looks very interesting but I'm not quite sure what I'm looking at. What is that map for and how does it filter sources, since it seems like it doesn't include all airports.

tomtomistaken · on Jan 28, 2024

There’s a lot of information there. They literally answer the question “What am I looking at?” in the first popup you get (How to use, top left corner).

coalteddy · on Jan 29, 2024

My bad. Didnt show on mobile.