Very cool. Love this. Was the training more heavily weighted towards swiss languages and how does the model perform on swiss languages compared to others?
Are there any plans for further models after this one?
The pretraining (so 99% of training) is fully global, in over 1000 languages without special weighting. The posttraining (See section 4 of the paper) had also as many languages as we could get, and did upweight some languages. The posttraining can easily be customized to any other target languages
I have a friend that works on physically based renderers in the film industry and has also done research in the area. Always love hearing stories and explanations about how things get done in this industry.
What companies are hiring such talent at the moment? Have the AI companies also been hiring rendering engineers for creating training environments?
If you are looking to hire an experienced research and industry rendering engineer i am happy to connect you since my friend is not on social media but has been putting out feelers.
One question i have regarding evals is, what sampling temperature and/or method do you use? As far as i understand temperature/ method can impact model output alot. Would love to here you're thoughts on how these different settings of the same model can impact output and how to go about evaluating models when its not clear how to use the to their fullest
For models we run ourselves from the weights, at the moment we'd use vLLM's defaults, but this may warrant more thought and adjustment. Other things being equal, I prefer to use an AI lab's API, with settings as vanilla as possible, so that we essentially defer to them on these judgments. For example, this is why we ran this Mistral model from Mistral's API instead of from the weights.
I believe the `temperature` parameter, for example, has different implementations across architectures/models, so it's not as simple as picking a single temperature number for all models.
However, I'm curious if you have further thoughts on how we should approach this.
By the way, in the log viewer UI, for any model call, you can click on the "API" button to see the payloads that were sent. In this case, you can see that we do not send any values to Mistral for `top_p`, `temperature`, etc.
Any blogs or other writing about this topic you can recommend? I worked with gurobi in the past but haven't been keeping up with the trends and performance gains.
Wow this is the first time i hear about such a method. Anywhere i can read up on how the temperature multiplier works and what the implications/effects are? Is it just changing the temperature based on how many tokens have already been processed (i.e. the temperature is variable over the course of a completion spanning many tokens)?
Just a fixed multiplier (say, 0.5) that makes you use half of the range. As I said I'm just speculating. But Sonnet 3.5's temperature definitely feels like it doesn't affect much. The model is overfit and that could be the cause.
This is not true. I have no idea about the US but Canada, specifically BC, still has large amounts of old growth forest that is being cut. Really sad to see and read about.
US cities love their cars. Not even in city centers do they prioritize pedestrians over cars. That has nothing to do with apples or oranges. It's a priority thing and not costs. There is no reason to need cars in city centers. Makes cities ugly, loud and dangerous compared to europe or asia.
How did you do this? Was the redaction done by changing the color of the font to white so that the background and text have the same color? Would love to learn how you were able to recover the text.
You can probably prompt it to further to generate python code and unmask the file for you, in the interpreter.
Incidentally, this use of GPT4 is somewhat similar to the threat model that they are studying. I'm a bit surprised that they've used plain GPT-4 for the study, rather than GPT-4 augmented with tools and a large dataset of relevant publications.
Their reasoning for not using tools or browsing from the "Limitations" section:
"No GPT-4 tool usage: Due to our security measures, the GPT-4 models we tested were used without any tools, such as Advanced Data Analysis and Browsing. Enabling the usage of such tools could non-trivially improve the usefulness of our models in this context. We may explore ways to safely incorporate usage of these tools in the future."
This site looks very interesting but I'm not quite sure what I'm looking at. What is that map for and how does it filter sources, since it seems like it doesn't include all airports.
There’s a lot of information there. They literally answer the question “What am I looking at?” in the first popup you get (How to use, top left corner).
Are there any plans for further models after this one?