More

jtsaw · 2025-09-13T01:20:02 1757726402

there's also some open source ways to share spatial data as a website from QGIS like LizMap [1], MapStore [2], and Mundi [3]

[1] https://www.lizmap.com/en/

[2] https://docs.mapstore.geosolutionsgroup.com/en/latest/

[3] https://mundi.ai/

jtsaw · 2025-04-07T09:11:56 1744017116

I’d say it’s more like Waymo’s world model. The main actor uses a latent vector representation of the state of the game to make decisions. This latent vector at train time is meant to compress a bunch of useful information about the game. So while you can’t really understand the actual latent vector that represents state, you do know it encodes at least the state of the game.

This world model stuff is only possible in environments that are sandboxed. Ie you can represent the state of the world in an and have a way of producing the next state given a current state and action. Things like Atari games, robot simulations, etc

jtsaw · on Aug 26, 2024

yea. We're definitely concerned about hallucinations and are using a variety of techniques to try and mitigate it (there's some existing discussion here, but using committees and sub-agents responsible for smaller tasks has helped).

What's helped the most, though, is using cluster information to back up decision making. That way we know the data it's considering isn't garbage, and the outputs are backed up by actual data.

jtsaw · on Aug 26, 2024

one thing we're experimenting to help with the hallucinations/error rate issue is using a committee framework where we take a majority vote.

If the error rate of 1 expert is 5%, then for a committee of 10 experts, the probability a majority of the committee errors is around 0.00276% (binomial distribution with p=0.05). For 10 steps, this would be an error rate of 0.0276%

threeseed · on Aug 26, 2024

Pretty bad maths there. Those committee members are not independent.

They are highly correlated even amongst LLMs from different vendors.

jtsaw · on Aug 26, 2024

I'm not sure they are highly correlated. A committee uses the same LLM with the same input context to generate different outputs. Given the same context LLMs should produce the same next token output distribution (assuming fixed model parameters, temperature, etc). So, while tokens in a specific output are highly correlated, complete outputs should be independent since they are generated independently from the same distribution. You are right they are not iid but the calculation was just a simplification.

jtsaw · on Aug 26, 2024

which website doesn't load for you?

klinquist · on Aug 26, 2024

trtparity.com, looks like it's a local problem, loads on cellular.

jtsaw · on Aug 26, 2024

The product will automatically execute runbooks for you. So far we've focused on using runbooks customers already have, since they know they work for them. We've also added the ability to turn of automatic execution for cases like a suggested runbook, so the customer can make any edits if necessary before approving it to be executed automatically.

Yea, this is a big challenge for us. We're using a variety of strategies to make sure hallucinations are rare, but that's why we're also committed to not executing actions that modify your cluster unless explicitly specified in a runbook

jtsaw · on Aug 16, 2024

yea, we'd like to actually create these issues on a real cluster, but we couldn't figure out a good way of doing it at scale. The best alternative that we could think of was using an LLM that knows the root cause and could hopefully simulate outputs of commands consistently. Let us know if you have other ideas, we're always looking for ways to improve it.

mjlee · on Aug 16, 2024

Would Kubernetes in Docker help? https://kind.sigs.k8s.io/

jtsaw · on Aug 16, 2024

We're not using our AI Agent to determine if your answers are correct or not. We're just using something off the shelf.

jtsaw · on Aug 16, 2024

try to be as detailed as possible. With text sometimes it's hard to determine how close your answer was to the correct one.

whynotkeithberg · on Aug 16, 2024

How did a user with 3 strawberries as their name achieve a 3333333.33% success rate?

EDIT: After asking this instead of responding back to me they just removed it. But it obviously shows how lacking their system is.

wilson090 · on Aug 16, 2024

must be the arrival of agi

wilson090 · on Aug 16, 2024

the user competition is meant to be a fun side-project that we threw together today, I think it's cool that people hack things like that so quickly :)

nsagent · on Aug 16, 2024

> the user competition is meant to be a fun side-project that we threw together today, I think it's cool that people hack things like that so quickly :)

The website comes off as a marketing strategy rather than a fun one-day hackathon project. I think that's why it's getting the reaction you're seeing.

jtsaw · on Aug 16, 2024

hmmm, not sure where the latency is coming from, might need to up the resources.

dgl · on Aug 16, 2024

Maybe the AI can work it out?

More seriously usually issues where the observed behaviour is "the system is slow" are harder to root cause than complete outages. It depends partly how good your capacity planning is obviously, but maybe an AI could help with that too.

thinkmassive · on Aug 16, 2024

If only there were a software engineering discipline that focused on these types of issues /s