Hacker Newsnew | past | comments | ask | show | jobs | submit | bob1029's commentslogin

At a certain rate we will be able to move towards continuous / real-time inference systems. The discrete, turn based solutions are quite confining with how they must be trained. Continuous and real-time would fundamentally alter the domain.

From an information theory perspective we are still in dial-up territory with regard to the actual information rate. 750 tokens per second would be a really bad dialup connection. Imagine 10 millions tokens per second.


Is there anyone exploring or writing about this in public? I've felt for a while that the turn-based model was not quite right, but also felt too stupid and ill-informed to have much of an opinion about what else it could be.

That would be interesting.

Do you feel most of the speed upgrade will come from the software or hardware side?


Ahh yes slop at the speed of light, how useful!


SpaceX moved first so it’s 3rd move advantage?

Mowing two weeks of summer growth just before a thunderstorm is peak vibes.

The biggest flaw I've seen with TDD is the fact that correctness does not compose upward. Every time two units come into contact, you've got an entirely new kind of unit. The tests from constituents do not cover emergent properties of the new things. You will repeat this same exercise the entire way up to the top, and the moment you come into contact with the customer (they want to change everything), the house of cards comes crumbling down and you have to start your agonizingly-slow process all over from the bottom again.

The only thing that the business seems to care about is top-down UI testing. This is also convenient because you can leave it until the very end after the customer has already seen several prototypes.

I do think TDD makes sense in isolated scopes (prove this specific custom parser works at the edges), but as the general policy for the entire product it's definitely not a viable practice. Much of the time if comes off as an ego trip to see just how cleverly we can mock something so that we can say we technically tested it.


I tell people you should be testing at the level where a change would be so hard you wouldn't do it anyway. Internal helper functions - they are tested only because the code that calls them passes. Interfaces that are used thousands of places - you better test them well because you wouldn't dare change that anyway: it would break too many others.

Or to put it differently: a test is an assertion that no matter what, for all time this should never change again. Even if customer requirements change in the future they won't change in such a way as to break this test (this isn't always true, but you should believe it is true).

A test is most valuable when it alerts you to a real problem when it fails. If the test fails but there isn't a real problem (either because customer requirements have changed, or it is flaky) it was needless cost to investigate it. If the test passes that gives some hope of correctness, but you can never be sure it is really correct vs a bug in the test (even if you use TDD and so the test failed when you wrote it that doesn't mean a refactoring since didn't make this an always pass test).

Part of the problem is if I tell you to write sort() or your new toy language's list type you have an intuitive idea of what it should look like and probably will get them right the first time (other than bugs you want the tests so you catch). These should have tiny micro tests. These things also are really easy to use as examples of how to do TDD - which they are, but they are not representative: this type of code is generally in your standard library already and you are not writing it.

Instead you are writing code that isn't well defined with lots of industry experience. It is not clear what the exact interface should be (or more likely it is clear customer requirements will change but you don't know how yet). You have no idea what the best implementation is. You don't know if this will be used in this one place, or if it will become a useful key part that many future projects depend on. You have to make guesses.


That is a flaw with unit tests written at far too low a level, not with TDD.

You would have the same problem if you wrote tests like that after the code.

TDD has no opinion about the level at which you wrote your test, it just assumes it's the correct one.

This is the number one biggest misconception about TDD which I keep seeing repeated on hacker news.

https://news.ycombinator.com/item?id=46810793

https://news.ycombinator.com/item?id=45113016


TDD for UI effects?

snapshot test driven development again. i already wrote a similar answer in response to your other comment.

it follows the definition of TDD and it works really well (with some caveats) but again some people get hung up on what their impression of TDD is (e.g. unit tests checking to see if a car object has a steering wheel or whatever...) rather than what it actually is and what about it is that actually works.


How does snapshot do "feels right" from designer point of view?

Um, show the snapshot to a designer? When it feels right, lock in the snapshot ("green") and then move on to refactor.

Or, probably more likely a group of snapshots.


I feel this, especially with the crazy lengths people go to mock things sometimes. A couple years back I was having a discussion with a friend/former coworker about testing (I was griping about unnecessary mocks I had to deal with for something at a job causing unnecessary extra work), and he asked how I would approach trying to get full unit test coverage instead. I was taken aback and said that I wouldn't try to get literally everything covered by unit tests in the first place. Most of the teams I've worked on have had the approach that test coverage is good, but it isn't necessarily going to be 100% even when considering all tests; I can't even imagine trying to get 100% coverage for unit tests alone being anywhere close to worth the extra effort, let alone the contortions that the code would need to take to support it.

Yeah.

Some TDD-obsessed companies will write tests in a way that requires you to spend a half hour understanding the web of mocks in order to update the tests to account for even a minor datastructure change. Coincidentally, your code change would cause those same tests to fail if they weren't mocked out, but they all pass until you make your changes to the mocks. This shreds the "if the tests pass, the change is probably correct" confidence that's most of the reason for having automated tests.

I am not a fan of this style of test writing.


Exactly, the whole system thinking and large scale architecture also fails apart, when writing everything from little working tests.

TDD is perfect for bugs; codify a replication first, then fix it.

Example for HLSL graphical glitch?

https://hitchdev.com/hitchstory/approach/snapshot-test-drive...

set up a rendering profile and preconditions that generates a minimal snippet of images/video using a predefined GPU profile.

then test for either a pixel perfect reproduction of the correct behaviour or for the properties you're looking for (if it doesnt reproduce deterministically).

this is one way. i also subscribe to the view that if the type system is modified to become stricter in such a way that it can fail reliably in the presence of this type of bug that this is also good enough.

some people might argue that these arent "strictly" TDD by some definition but they set out a path to follow red green refactor and confer identical benefits so my view is who gives a duck?

I don't have enough domain expertise to know which variant of these approaches is best but I'm enough of a TDD expert to know that what you're implying isnt possible is actually something you would would probably derive a lot of value from if you did it.


Now do that interactive with feed back from design team and user testing.

Iterate on the design til the snapshots look the way the design team wants.

That's just an extended red where you get feedback from elsewhere.


> At 200GB I don't know how anybody can justify valuable space in their SSD for a single game.

I have 2 classes of SSD in my system. I've got 512gb of extremely fast, high quality NVMe storage. Then I've got 4tb of the cheapest bullshit I could find on Amazon. I put the big games on my crappy ssd. The performance difference is not something I care about anymore.


> I assume they're mostly the kind that has never been to Europe

The big disconnect comes from the fact that places like Miami and Houston don't really have analogous European peers in terms of climate. There are places that come close but it's not the same.

It's one thing for it to be unbearably hot at 2-6pm. Its a different thing altogether for it to still be 80F+ at 3am every single night for months on end. You cannot escape the heat in Houston without phase change cooling technology. Latent heat removal is what most of us are paying for around here (water out of the air). Not sensible heat removal.

I can walk down my street and find 2-3k sqft sqft homes that have 5+ tons of HVAC capacity. There is a home with three condensing units and it's not much bigger than mine (I only have a single 3 ton system). I've been thinking about getting a multizone ductless installed on top of my central unit to deal with July and August.

https://www.yahoo.com/news/weather-news/articles/worst-ive-e...


I just had to replace an AC in my condo in Miami. The replacement parts' cost for the old unit didn't make sense. Got a 3 ton Bosch unit, parts and installation was $6K. 3 tons is just enough for 1600 square feet and old single pane windows.

You cannot survive without an AC in Florida. Nor in Las Vegas, Dallas, NYC and the boroughs, the list goes on. Doable in Santa Monica or in the Bay perhaps.

Humidity is a larger burden on me than temperature. I can do dry Las Vegas at 100F, but can't stand Miami at 80F and 70% humidity.


I started building my agent loops based on the RLM paper and I am finding the recursive part serves two major purposes. First, it pushes down token consumption as you describe. The other thing it does is prevent the agent from returning too soon, since most of the real work happens at depth. Especially if you forbid tool use in the root.

I am starting to wonder if maybe I could just focus on these aspects more directly as opposed to treating them as side effects of symbolic recursion. I do have to agree with the paper in that recursive depth beyond 1 doesn't seem to matter. At least not with the current frontier of models. If we can't recurse more than once and extract much uplift, then I question us labeling this a recursive scheme.


I've got a concentric dual hose unit from Midea that I use in emergencies when my central unit is out. I just put the hose through the dog flap on the back door when I run it. I'm sure you could figure out a solution if you actually wanted to be comfortable.

I've seen some 2020+ construction in the Texas market that might not make it much beyond 2040. If you want to see stuff that lasts, constrain year of construction to the 20th century on Zillow.

I know of entire subdivisions where the new homes are being actively consumed by black mold due to insufficient HVAC capacity for latent heat removal. Every single central dehumidifier installed in those builds has lost its refrigerant charge due to some common manufacturer (Honeywell) defect. So, in some cases we didn't HVAC hard enough despite being America.

I don't understand how you can maintain a civilization in a place like Miami or Galveston without modern hvac.


Maybe we can start adding these things to the list.

https://news.ycombinator.com/item?id=48641261


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: