This is really important: You're not the end user of this product. These types of models are not built for laypeople to access them. You're an end user of a product that may use and process this data, but the CRPS scorecard, for example, should mean nothing to you. This is specifically addressing an under-dispersion problem in traditional ensemble models, due to a limited number (~50) and limited set of perturbed initial conditions (and the fact that those perturbations do very poorly at capturing true uncertainty).
Again, you, as an end user, don't need to know any of that. The CRPS scorecard is a very specific measure of error. I don't expect them to reveal the technical details of the model, but an industry expert instantly knows what WeatherBench[1] is, the code it runs, the data it uses, and how that CRPS scorecard was generated.
By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts (aka the ones you get on your phone). These are a piece of the puzzle, though, and not one that you will ever actually encounter as a layperson.
Sorry to hijack you: I have some questions regarding current weather models:
I am personally not interested in predicting the weather as end users expect it, rather I am interested in representative evolutions of wind patterns. I.e. specify some location (say somewhere in the North Sea, or perhaps on mainland Western Europe), and a date (say Nov 12) without specifying a year, and would like to have the wind patterns at different heights for that location say for half an hour. Basically running with different seeds, I want to have representative evolutions of the wind vector field (without specifying starting conditions, other than location and date, i.e. NO prior weather).
Are there any ML models capable of delivering realistic and representative wind gust models?
(The context is structural stability analysis of hypothetical megastructures)
I mean - you don't need any ML for that. Just go grab random samples from a ~30 day window centered on your day of interest over the region of interest from a reanalysis product like ERA5. If the duration of ERA5 isn't sufficient (e.g. you wouldn't expect on average to see events with a >100 year return period given the limited temporal extent of the dataset) then you could take one step further and pull from an equilibrium climate model simulation - some of these are published as part of the CMIP inter-comparison, or you could go to special-built ensembles like the CESM LENS [1]. You could also use a generative climate downscaling model like NVIDIA's Climate-in-a-bottle, but that's almost certainly overkill for your application.
The ERA5 seems to give hourly data, i.e. nyquist limit would thus give decent oscillation amplitudes for waves with periods of about 5 hours or more, whereas I am more interested in faster timescales seconds, minutes, i.e. wind gusts.
Calculating the stability and structural requirements for a super-chimney to the tropopause, would require representative higher temporal frequency wind fields
Do you know if I can extract such a high time resolution from LENS since a cursory look at ERA5 showed a time resolution of just 1 hour?
The advantage of an ML model is that its usually possible to calculate the joint probability for a wind field, or to selectively generate a dataset with N-th percentile wind fields etc.
If its differentiable, and the structural stress assumptions are known, then one can "optimize" towards wind profiles that are simultaneously more dangerous and more probable, to identify what needs adressing. Thats why an ML model of local wind patterns would be desirable. ML is more than just LLM's.
What one typically complains of in the context of LLM's: that there's no error bars on the output, is not entirely correct: just like differentiable ML models for physical and other phenomena they too allow to calculate the joint probability of sentences, except instead of modeling natural phenomena it is modelling what humans uttered in the corpus (or implicit corpus after RLHF etc). A base model LLM can quite accurately predict the likelihood of a human expressing a certain phrase, but thats modeling human expressions, not their validity. An ML model trained on actual weather data, or fine grained simulated weather data results in comparatively more accurate probability distributions, because physics isn't much of an opinion.
> By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts.
Sorry - not sure this is a reasonable take-away. The models here are all still initialized from analysis performed by ECMWF; Google is not running an in-house data assimilation product for this. So there's no feedback mechanism between ensemble spread/uncertainty and the observation itself in this stack. The output of this system could be interrogated using something like Ensemble Sensitivity Analysis, but there's nothing novel about that and we can do that with existing ensemble forecast systems.
They're all easily solvable problems. The issue, as GP mentioned, is that the pennies are just stopping without the thought through these problems and planning for the solutions. This was done via a social media post, not a well thought out transition like Canada had.
If they're easily solvable then why do you need planning?
Changing the currency on a whim by executive fiat is stupid, but that's just principle. In practical terms, I really have a hard time caring about the problems this specific change creates.
> If they're easily solvable then why do you need planning?
Easily solvable problems still need coordination. Do you want to go to one store and have your change rounded up then go to another and have it rounded down?
Sure, who cares? This could already be happening today with rounding fractional pennies. I have no clue if stores round up, or down, or split at .5, or what. But obviously they're doing something, since there aren't physical fractional pennies and my card statements never show more than two decimal digits, so it's not a new problem. This would make the problem five times worse, but five times insignificant is still not something I'm going to worry about.
I assume you're just correcting capitalization, but for clarity's sake, the aforementioned Bitcoin mines are in west TX, the region with oil derricks, wind turbines, and tumbleweeds, and not West, TX, the town with the Little Czech Stop and delicious kolaches.
I'm a fellow Kingdom resident -- there are plenty of ski hills between here and Boston. The ones closer get more play. Stowe is a bigger resort, and it gets more play. Less so between here and Montreal, but there's ski hills north of Montreal that are closer. A place like Littleton, just beyond Cannon Mountain, _is_ brimming with customers every long weekend. Jay (and Burke, which is _really_ a locals' mountain) are very much in the in-between.
In spite of doing a reasonable bit of downhill skiing at one point out of the Boston area, I have never been to Jay and have only been to Burke because a Dartmouth professor I knew was I think on the board and his son had gone to Burke Mountain Academy.
Stowe was about my journey limit for a weekend and I was in a ski condo after graduation from grad school around Killington for 5 years or so. (Combination of New Yorkers and people in the Boston area.) Jay and Burke were always pretty much out of my range.
That duck curve tweet is disingenuous. That curve in the tweet is for the lowest net load day (net load is actual load or usage minus generation from renewables). In 2023, if you took the day that had the least amount of net load, yes, it was almost entirely covered by solar power. That does _not_ mean the claim made in the tweet that California is run totally by solar power from 10am-4pm every day (today at 11:56 AM PST, it's about 51% run by solar power). California's grid has enough good things going for it that we don't need to lie about it.
Because of the curse of the autodidact, do note that "ennui" is pronounced "on-wee," as it comes from French ("ennui" is "boredom" in French). It is not, as I found out rather embarrassingly, "en-you-eye".
Incidentally, I agree with you. While there could be some editing for length (oh well), the point was well made with a great example to start out with, and a bit of a discussion about some of the effects of being trapped in the Ennui Engine. It definitely hit on something I've noticed about myself.
I have a pile of books I've been meaning to read but haven't gotten to. I have lots of articles that I'd like to read but haven't made time yet. But I'd pull up Reddit and just scroll there. I deleted Twitter when Elon bought it and decided to burn it to the ground, and I'll be deleting Reddit now. Not so much to make a stand, but really just using this opportunity of upheaval as a way for my old head to extricate itself from the Ennui Engine.
'It is not, as I found out rather embarrassingly, "en-you-eye"'
Don't worry, as someone whose vocabulary was extended through voracious reading, I have made several of those "fox passes" over the years... (with that being the most memorable!)
Funny indeed. My wife was just making fun of me the other day becuase of the way I pronounced this. This was the answer to Wordle last week. I new the word by sight, but like a lot of us, had never heard it spoken. I sent her a link to this article just because ennui was embedded in the title :)
The article states that if there are multiple moves the engine recommends, that they all count as the "optimal" move, even if there's an indication of preference by the engine.
I'm approaching 40 years old, have played chess off and on since I was a kid, and I'm not sure I've ever played a match that didn't include several blunders. Like, on the off chance I'm playing someone who doesn't blunder often, I'll certainly pick up their slack.
My game quality is measured in how many times I say "fuck!" right after moving a piece. A very good game for me is about a two-fuck game.
The fun thing about chess, you don’t seem to think you are good, but I can’t imagine only
making two obvious mistakes in a game! It grows with you, haha.
Most of them it's more like four or five "oh my god I hope they don't see that thing I spotted the second I took my hand off" moments—and that's just the ones I notice before they're exploited. I'm sure I make tons of moves that anyone half-decent would call blunders but that simply go unnoticed by both players at the board.
I'm so very bad at spotting diagonal attacks, especially. Anyone who can open up their bishops then play for time will eventually see me put my queen in some dumbshit situation that lets them take it free or cheap in a single move, for instance, not even any multi-move planning required.
Lichess has a variant called "Antichess". If you can take a piece, you have to. No checks/checkmate rules. First person to have zero remaining pieces wins.
You basically want to "blunder" into giving your opponent long chains of captures while avoiding any positions that allow your opponent to hang a piece.
In my experience, the software development profession could spend a long, long time doing some self-reflection about this one. It's eloquently stated, and something a lot of developers could learn. Too many times, I've seen overly restrictive inputs cause users to hate and distrust the software. Ironically, overly restrictive inputs cause users to think that the software doesn't properly understand the domain, which is the root of mistrust.
We should be very liberal with accepted inputs. I call them "Fuck It Buttons." There are lots of cases where you want a "Fuck It" button to just go around all the data entry and get an answer or move on with minimum info. Warn that the data isn't complete and we're using defaults, and don't just make output look the same as a complete workflow, but let them go through, nonetheless. Health care is just one example, but "Fuck It" comes up in every industry.
This is the UI/UX equivalent of knowing which hills to die on.
I highly doubt the developers have anything to do with this in this case. The people requiring the inputs are managers who are responding to regulators and insurers. They're not going to buy software that has a Fuck It Button that allows workers to skip data entry that could cost them money, time, or lawsuits.
At Epic there aren't really product managers. The devs mostly set the projects, design, scope with input from clinicals/sales (a very small group).
So it is interesting to me because most of the design choices - both good and bad - are made by the devs themselves with input from area experts in the aforementioned group, QA, and customer implementation/support.
Some might argue there would be better results with someone who is not the dev managing the product more. But there are pros as well as cons.
> Ironically, overly restrictive inputs cause users to think that the software doesn't properly understand the domain, which is the root of mistrust.
Well, they're quite likely right! I freely admit that as a developer, I don't understand the domain remotely as well as the people working in it (construction workers in my case). The reason there's still a decent market for custom software and that the likes of Epic haven't gobbled up everything, is that every so often a construction worker, dentist, nurse or whatever picks up enough programming to actually make something that suits them, and they manage to bypass the administrators addicted to sales dazzle.
At our company we came to the same conclusion. We have a DSL that is essentially a programmable schema to describe the shape of the data (more specifically a contract) you want to capture as well as how answers and decisions are derived from it. The only hard validation we have are types, eg. you can't put letters in a box that captures a number type. Other than types the rest is soft-validation which means that you can input anything, even if it is partial or not quite correct, and the system will do its best with what it has. In tandem, at any point in time you can ask the system to tell you what is missing and/or incorrect. All this then affects the lifecycle of the information, ie. you can't move past certain checkpoints in the workflow if the information is not in the required shape. In the context of a medical software imagine you can fill in just the things that will get you back meaningful answers to help you treat someone and you can deal with the rest after to make the case complete.
Like with most things, it depends! Maybe it's a default value; maybe it's a null; maybe it's a special value that triggers a workflow on insert. I'm an evangelist for RDBMSes, and they can do so much, so let them help you!
Maybe you have a state column that's derived that you cannot move to another step in the workflow until all nulls are filled in, but you've let the UI save what data it knows about and move on. It totally depends on what the user is doing and why we're skipping steps/data.
Honestly this sort of deferred validation exists as a standard feature of certain modes of data intake that are often criticized for same, for example paper forms (the "required" fields can be left blank), or creating support tickets via email (required fields stay null until an agent updates the ticket via web UI). At some point additional round trips may occur to pay back this debt, but debt is a powerful tool for that person in the field who needs to move onto other things until the dust settles and they can pay it back.
Paper forms can be filled in by different people with different roles at different times. Customers can fill in their personal data and possibly leave blank something they don't understand. A clerk can review the form, ask the right question and fill in the missing field plus the remaining ones.
"Fill in what you know and leave the rest to us" is a simple and cheap to implement GUI compared to a full fledged workflow. It could bootstrap a process quickly at the cost of some extra labor in the customer facing department. Maybe they have that extra bandwidth and the sw developers don't.
Again, you, as an end user, don't need to know any of that. The CRPS scorecard is a very specific measure of error. I don't expect them to reveal the technical details of the model, but an industry expert instantly knows what WeatherBench[1] is, the code it runs, the data it uses, and how that CRPS scorecard was generated.
By having better dispersed ensemble forecasts, we can more quickly address observation gaps that may be needed to better solidify certain patterns or outcomes, which will lead to more accurate deterministic forecasts (aka the ones you get on your phone). These are a piece of the puzzle, though, and not one that you will ever actually encounter as a layperson.
[1]: https://sites.research.google/gr/weatherbench/