More

avital · 2025-05-20T17:07:27 1747760847

I work at OpenAI (not on Codex) and have used it successfully for multiple projects so far. Here's my flow:

- Always run more than one rollout of the same prompt -- they will turn out different

- Look through the parallel implementations, see which is best (even if it's not good enough), then figure out what changes to your prompt would have helped nudge towards the better solution.

- In addition, add new modifications to the prompt to resolve the parts that the model didn't do correctly.

- Repeat loop until the code is good enough.

If you do this and also split your work into smaller parallelizable chunks, you can find yourself spending a few hours only looping between prompt tuning and code review with massive projects implemented in a short period of time.

I've used this for "API munging" but also pretty deep Triton kernel code and it's been massive.

csmpltn · 2025-05-20T18:42:03 1747766523

> "Look through the parallel implementations, see which is best (even if it's not good enough), then figure out what changes to your prompt would have helped nudge towards the better solution."

How can non-technical people tell what's "best"? You need to know what you're doing at this point, look for the right pitfalls, inspect everything in detail... this right here is the entire counter-argument for LLMs eliminating SWE jobs...

diggan · 2025-05-20T22:35:31 1747780531

> How can non-technical people tell what's "best"? You need to know what you're doing at this point, look for the right pitfalls, inspect everything in detail... this right here is the entire counter-argument for LLMs eliminating SWE jobs...

I'm not sure a tool that positions itself as a "programmer co-worker" is aiming to be useful to non-technical people. I've said it before, but I don't think LLMs currently are at the stage where they enable you to do things you have 0 experience in, but rather can help you speed up working through things you are familiar with. I think people who claim LLMs will completely replace jobs are hyping the technology without really understanding it.

For example, I'm a programmer, but never done any firmware flashing with UART before via a USB flasher. Today I managed to do that in 1-2 hours thanks to ChatGPT helping me out understanding how to do it. If I'd do it completely on my own, I'm sure it would have taken me at least the full day to do so, instead of the time it took. I was able to see when it got mislead, and could rewrite/redirect from there on, but someone with 0 programming experience, probably wouldn't have been able to.

fragmede · 2025-05-20T23:25:26 1747783526

It depends on their setup and where they or the LLM gets stuck. If an experienced programmer is there to back them up, then a total beginner could totally make something. That is, given some familiarity with the terminal, specifically the know-how to setup a git repo on GitHub and clone it locally, and then setting up env keys and Aider, and the know-how to run npm I and npm run dev, a non programmer with some terminal skills someone is able to make simple games, purely by talking to Aider using the /voice command. When the LLM or they get stuck is when they'll need some backup from somebody with a decent amount of programming experience to get unstuck. Depending on what their doing though, it's entirely possible they won't get stuck until much further along in the dev process.

throwuxiytayq · 2025-05-20T18:56:38 1747767398

I don’t think anyone expects software engineers will disappear and get replaced by janitors trained to proompt. I’m sure experts will stick around until the singularity curve starts looking funny. It’s probably gonna suck to enter the industry from now on, though.

jazzyjackson · 2025-05-20T22:55:32 1747781732

Well, right, how does one become a senior engineer in a world where no one needs to hire a junior? I'm sure many other industries have experiences this already, where the only people who know anything retire and the people are left maintaining a system they could not rebuild such that when something goes wrong the only practicable choice is to replace it with new equipment.

That's where I see AI-written software going, write-once. Some talented engineer gets an AI system to create a whole k8s cluster to run an application and if any changes need to be made, bugs fixed, it will take another talented engineer to come in and have an AI write a replacement and throw out the old one.

Reminds me of this blog, The real value isn’t in the code [0], we're heading for a world that is only code and no one who knows what it does. But maybe it won't matter.

[0] https://jonayre.uk/blog/2022/10/30/the-real-value-isnt-in-th...

weatherlite · 2025-05-21T10:37:34 1747823854

> Well, right, how does one become a senior engineer in a world where no one needs to hire a junior?

You don't. Unless the person is super brilliant I just don't think the industry needs many more new people, there are enough for the next 1-2 decades and after that humans will probably not be needed at all.

People should go where the demand is - medicine, education, policing or whatever it may be.

paulryanrogers · 2025-05-21T12:48:19 1747831699

> People should go where the demand is - medicine, education, policing or whatever it may be.

'Where' is becoming an increasingly small niche with ever higher educational requirements.

tmaly · 2025-05-21T16:35:26 1747845326

One could put a lot of time into open source or run your own side hustle to build up experience to a senior engineer level.

I don't see the corporate path being the best way given the circumstances.

dingnuts · 2025-05-20T19:32:32 1747769552

> I don’t think anyone expects software engineers will disappear

holy gaslighting Christ have some links, lots of people think that

https://www.reddit.com/r/ITCareerQuestions/comments/126v3pm/...

https://medium.com/technology-hits/the-death-of-coding-why-c...

https://medium.com/@TheRobertKiyosaki/are-programmers-obsole...

https://www.forbes.com/sites/hessiejones/2024/09/21/the-auto...

and on and on, endless thinkpieces about this. Certainly SOMEONE, someone with a lot of money, thinks software engineers are imminently replaceable.

> until the singularity curve starts looking funny.

well there's absolutely no evidence whatsoever that we've made any progress to bringing about Kurzweil's God so I think regardless of what Sam Altman wants you to believe about "general AI" or those thinkpieces, experts are probably okay.

cdolan · 2025-05-20T19:39:06 1747769946

I think you are correct that people say this, but its absurd that they are saying it in the first place.

Coding/engineering/etc is all problem solving in a strucutred manner.

That skill is not going anywhere

dingnuts · 2025-05-20T19:45:19 1747770319

oh I agree but the last three years has felt like an endless chorus of people telling me SWE was going to be obsolete very soon so I had to push back against the idea that "nobody" thinks that.

I wouldn't have to listen to people talk about it all the time if nobody thought it was true

daveguy · 2025-05-20T21:26:43 1747776403

(not GP) To be fair, just because someone says something doesn't mean they believe it. Most of those folks have to know they're being absurd. But I agree saying "nobody" thinks something is over the top. People on the internet can be quite looney tunes.

mediaman · 2025-05-21T12:52:16 1747831936

A lot of people believe that programming is the typing of odd sequences of characters into a computer.

To them, it seems LLMs are also perfectly capable of typing odd sequences of characters.

The idea that SWEs do actual structured problem solving is mostly native to industry insiders.

daveguy · 2025-05-21T20:48:39 1747860519

Thank you for this. A very well stated explanation of a major reason the hype is soo off base from the people doing the work every day.

schainks · 2025-05-20T21:20:06 1747776006

> proompt

The verb you use when you only need to produce boilerplate.

> Prompt™

The verb you use when it's time to innovate.

ivraatiems · 2025-05-20T19:02:50 1747767770

How much faster is this than simply writing the code yourself?

thearn4 · 2025-05-20T19:10:47 1747768247

I end up asking the same question when experimenting with tools like Cursor. When it can one-shot a small feature, it works like magic. When it struggles, and the context gets poisoned and I have to roll back commits and retry part of the way through something, it hits a point where it was probably easier for me to just write it. Or maybe template it and have it finish it. Or vice versa. I guess the point being that best practices have yet to truly be established, but totally hands-off uses have not worked well for me so far.

sunnybeetroot · 2025-05-20T19:18:51 1747768731

Why commit halfway through implementing something with Cursor? Can you not wait until it’s created a feature or task that has been validated and tests written for it?

fragmede · 2025-05-21T16:54:10 1747846450

Why wait until everything is finalized before committing? Git is distributed/local, so while one philosophy is to interact with it as little as possible, the other one is to commit early and commit often, and easily be able to rollback to a previous (working) state, with the caveat that you clean-up history before firing off a PR.

daveguy · 2025-05-20T21:29:59 1747776599

Why not create a branch and rollback only what needs to be rolled back? Branches are O(1) with git, right?

sunnybeetroot · 2025-05-20T21:31:54 1747776714

OP was insinuating that rolling back commits is a pain point.

daveguy · 2025-05-21T12:26:28 1747830388

Well, same statement applies. Rolling back commits is also O(1) and just as easy. And if you branch to start with it's not even a "rollback" through the commit history, it's just a branch switch. Feel like OP has never used git before or something.

fragmede · 2025-05-21T16:39:05 1747845545

Which seems like a tooling issue, imo. In Aider, it's just /undo.

avital · 2025-05-20T19:08:28 1747768108

Easily 5-10x or even more in certain special cases (when it'd take me a lot of upfront effort to get context on some problem domain). And it can do all the "P2"s that I'd realistically never get to. There was a day where I landed 7 small-to-medium-size pull requests before lunch.

There are also cases where it fails to do what I wanted, and then I just stop trying after a few iterations. But I've learned what to expect it to do well in and I am mostly calibrated now.

The biggest difference is that I can have agents working on 3-4 parallel tasks at any given point.

atonse · 2025-05-20T23:32:39 1747783959

This has been my experience too. Certain tickets that would’ve taken me hours (and in one case, days), I’ve been able to finish in minutes.

Other tasks take maybe the same amount of time.

But just autocomplete saves micro-effort all day long.

SkyPuncher · 2025-05-21T02:02:26 1747792946

For me, it’s not that the actual coding is faster. It’s that you can do other things at the same time.

If I’m writing an integration, I can be researching the docs while the agent is coding something up. Worst case, I throw all of the agents work away while now having done research. Best case, it gets a good enough implementation that I can run with.

m_fayer · 2025-05-21T11:41:03 1747827663

Totally. I feel like it’s akin to jamming with someone. We both go down our own paths for a bit, then I have a next step for it, and I can review what it last came up with and iterate while it does more of its own thing. Rinse, repeat. This is more fun and less energy consuming than “do it all yourself”, which certainly means a lot.

This way works for me. Any time I tried to treat it as a colleague that I can just assign tasks to, it’s failed miserably.

weatherlite · 2025-05-21T11:11:05 1747825865

> Worst case, I throw all of the agents work away while now having done research

The worst case is you take the agent's work without really understanding it, continue doing it indefinitely and at some point get a buggy repo you have no idea how to handle - at the exact same moment some critical issue pops up and your agent has no clue how to help you anymore.

krageon · 2025-05-22T22:07:37 1747951657

I don't think GP said they couldn't do their job, but you instantly jumped to incompetence. That seems little uncharitable to me.

weatherlite · 2025-06-03T18:41:56 1748976116

Have no idea what GP can or cannot do and wasn't talking about that. I'm saying what the worst case that can happen when people work with agents, and it can happen to anyone who isn't carefully verifying and testing the agent's work.

dgunay · 2025-05-20T19:49:59 1747770599

At the current capabilities of most LLMs + my personal tolerance for slop, the most productive workflow seems to be: spin up multiple agents in the background to work on small scope, straightforward tasks while I work on something bigger that requires more exploration, requirements gathering, or just plain more complex/broad changes to the code. Review the output of the agents or unstick them when there is downtime.

IMO just keeping an IDE window open and babysitting an agent while it works is less productive than just writing the code mostly yourself with AI assistance in the form of autocomplete and maybe highly targeted oneshots using manual context provided "Edit" mode or inline prompting.

My company is dragging their feet on AI governance and let the OpenAI key I was using expire, and what I noticed was that my output of small QoL PRs and bugfixes dropped drastically because my attention remains focused on higher impact work.

th0ma5 · 2025-05-20T18:24:23 1747765463

Do you find yourself ditching on the things when they change something important with the new prompt? I don't get how people aren't absolutely exhausted by actually implementing this prompt messing advice when I thought there were studies saying small seemingly insignificant changes greatly change the result, hide blind spots, and even having a prompt for engineering a better prompt has knock on increases in instability. Do people just have a higher tolerance for doing work that is not related to the problem than I do? Perhaps I only work on stuff there is no prior example for, but every few days I read someone's anecdote on here and get discouraged in all new ways.

avital · 2025-05-20T18:39:53 1747766393

Not to downplay the issue you raise but I haven't noticed this.

Every iteration I make on the prompts only make the request more specified and narrow and it's always gotten me closer to my desired goal for the PR. (But I do just ditch the worse attempts at each iteration cycle)

Is it possible that reasoning models combined with the actual interaction with the real codebase makes this "prompt fragility" issue you speak of less common?

th0ma5 · 2025-05-20T20:30:48 1747773048

No, I've played with all the reasoning models and they just make the noise and weirdness even worse. When I dig into every little issue, it's always something incredibly bespoke. Like the actual documentation that's on the internet is out of date for the library that was installed and the API changed, the way the one library works in one language is not how it works in the other language, just all manner of surprising things. I really learned a lot about the limits of digital representation of information.

owebmaster · 2025-05-20T17:37:22 1747762642

Can it be used to fix bugs? Because the ChatGPT web app is full of them and I don't think they are getting fixed. Pasting big amounts of text freezing the tab is one of them.

dimal · 2025-05-20T19:00:51 1747767651

Bugs? Those are grubby human work.

Seriously, everyone should get good at fixing bugs. LLMs are terrible at it when it’s slightly non-obvious and since everyone is focusing on vibe coding, I doubt they’ll get any better.

jampekka · 2025-05-20T21:30:55 1747776655

The Android app is even worse.

owebmaster · 2025-05-20T23:20:45 1747783245

If that is what the best unlimited AI can deliver we are safe for at least 10 years more.

macrolime · 2025-05-21T01:03:49 1747789429

Sounds like you're manually doing something that could form the basis of further reinforcement learning.

Nudging the UI slightly for this exact flow could generate good training data.

ionwake · 2025-05-20T18:02:22 1747764142

You guys are doing great work, codex too, keep at it.

yieldcrv · 2025-05-20T21:52:19 1747777939

how much would this cost you if you didn't work at OpenAI?

avital · 2025-05-20T22:01:23 1747778483

I think the Pro plan is $200/mo for everyone? (But honestly I don't know the GPU cost and I'm interested in this question)

yieldcrv · 2025-05-21T00:25:53 1747787153

I thought you had privileged and complementary access from working there

avital · on Nov 19, 2023

Greg had been writing deep systems code every day for many many house for the past few years.

avital · on Aug 27, 2023

For those who are interested in getting involved with online math circles (as parents or potential instructors), check out https://theglobalmathcircle.org (Jeremy, the author of this post graduated from our training program)

avital · on March 29, 2023

This isn't accurate. The bottleneck in very-large-scale-training BY FAR is communication between devices. If you have a million CPUs, the communication cost will be significantly higher than a thousand A100s (perhaps in the order of 100x or even more). So this is only possible to replicate with very dense and high compute chips with extremely fast interconnect.

nuancebydefault · on March 30, 2023

Thanks for providing this insight. Is A100 the only platform? Can we pause/resume all such platforms simultaneously?

avital · on March 20, 2023

OP here. I help run the Global Math Circle. Happy to answer any questions.

Here's a short blurb about our approach and history:

---

The GMC approach

Our approach is to treat math education as an accessible mystery. We combine a small group of children and an experienced guide. They go deep for ~8 weeks seeking to understand the insight behind the mystery. It’s an awesome experience where once a week the children share ideas and debate with each other what the next question should be, discovering deep mathematics along the way. The guide contributes minimal input because the discussion is led by the students, helping them learn leadership skills along the way. We never turn anyone away for lack of funds. All of our prices for circles and leader training are sliding scale and we regularly accept kids who can’t pay at all.

---

Our History

The Global Math Circle (originally named “The Math Circle”), the earliest math circle in the US, was founded in Boston, MA in 1994 by Robert and Ellen Kaplan[1], authors of such books as Out of the Labyrinth[2] and The Art of the Infinite[3]. The intention, in Bob's words, "Small groups meet with their leader, each seeing and talking with all; and as always, the leader no more than posing a deep and exciting question (an accessible mystery) and the students probing it collegially together. They come up with examples, counter-examples, insights and proofs: intense fun."

In 2015, the GMC moved online and has been running online circles and leader training institutes since. A group of enthusiasts have been helping Bob and Ellen, and with Bob's passing in 2022, continue to operate the GMC on the same principles that guided the organization from its inception. In addition to our main presence across the United States, GMC actively collaborates with groups around the world to spread our approach, from Senegal[4] to Brazil[5] and from Kenya[6] to Columbia[7].

[1] https://www.theglobalmathcircle.org/bob-and-ellen-kaplan

[2] https://www.amazon.com/Out-Labyrinth-Setting-Mathematics-Fre...

[3] https://www.amazon.com/Art-Infinite-Pleasures-Mathematics/dp...

[4] https://uploads-ssl.webflow.com/62a8b0053c26bc44ba610734/630...

[5] https://web.archive.org/web/20210730105533/http://www.ocircu...

[6] https://uploads-ssl.webflow.com/62a8b0053c26bc44ba610734/630...

[7] https://www.circoap.org/acerca-de

avital · on Jan 4, 2020

The Netherlands has a digital government sign-in ("DigID") which works very well, secured using text messaging or 2 factor auth. With it you view all of your official documents across all branches of government. It allows for delegating permissions (say between a married couple). It's really great and shows that with competence these systems work out great.

avital · on March 14, 2017

That explains LIME, an older paper that's not the one being discussed here (but is referenced)

anewhnaccount2 · on March 14, 2017

It also explains what an explanation is in this context (which is what was asked): a local linear approximation of the model. Additionally it has a diagram which is nice. Obviously it's not the one being discussed here though -- I'd hardly be adding useful information if I just linked to the submission again as a reply.

asross · on March 14, 2017

Yeah, so local linear approximations are what we and LIME are using as explanations, but it's not what an explanation is generally.

In the paper we do define an explanation as basically any artifact that "provides reliable information about the model’s implicit decision rules for a given prediction." It's kind of a rough and over-general definition, but it gets to the idea that explanations can be partial. All we want to do is turn a completely black-box model into something slightly more transparent.

Ideally, we could have explanations that were at a higher level of abstraction, e.g. "this image is a picture of a husky and not a wolf because of the shape of the nose and the color of the coat," but a neural network has no idea what "nose" and "coat" means. Sometimes its intermediate layers will end up corresponding to meaningful abstract concepts like that, but not always.

avital · on June 7, 2016

I believe this is solved by Mongo's "snapshot" method on cursors: https://docs.mongodb.com/v3.0/faq/developers/#faq-developers...

glasser · on June 7, 2016

If I understand correctly, this method says "only scan the built in _id index, not any other index". Which means that you will not hit this index-specific bad behavior, but also that you won't get the performance characteristics of using an index.

avital · on May 17, 2016

(Former Meteor core dev here) This is cool!

Does Horizon also solve "optimistic updates"? If so I'd love to learn more details. For comparison, Meteor keeps a local datastore that updates immediately when data is mutated and then reconciled with the real database.

coffeemug · on May 17, 2016

The current version of Horizon doesn't do optimistic updates, but it's on the roadmap. Check out https://github.com/rethinkdb/horizon/issues/23. This is relatively easy because RethinkDB itself has support for notifying the client via a feed when a particular update has landed (i.e. the RethinkDB client can correlate the write with a feed message). It just didn't make it into v1, but will happen soon.

Liron · on May 17, 2016

See also my issue about implementing the equivalent of Meteor's write fence: https://github.com/rethinkdb/horizon/issues/344

avital · on July 2, 2015

Hi, I wrote this blog post.

We were trying to keep our API surface area small with one way to load data into components, but you're right -- we should probably /also/ add a ES6 base class as a second option, and let the people choose which they prefer.

A lot of React developers still prefer mixins -- react-router recently switched from mixins to ES6 classes and then changed their mind "until ES6 classes have better answers to replace what mixins do (like decorators).": https://github.com/rackt/react-router/blob/master/UPGRADE_GU...

clessg · on July 2, 2015

> we should probably /also/ add a ES6 base class as a second option, and let the people choose which they prefer.

Please provide a higher-order component or ES7 decorator (assuming ES7 will be supported?), would rather not use inheritance.

jxm262 · on July 2, 2015

^ this. Id much rather use a decorator with ES7 then extending out classes. Although, i thought i heard that react is working on making some better alternative to mixins

anarchy8 · on July 2, 2015

That is very disappointing. Decorators already exist in Babel and they work perfectly. I don't know what they're waiting for

pspeter3 · on July 2, 2015

I suspect they're waiting for the proposal to be more stable for the official integration. This seems like a great opportunity for the community to write a decorator and let a convention form before it gets merged into the core framework

djmashko2 · on July 2, 2015

There's a more detailed discussion about mixins vs. other data loading methods on this GitHub issue: https://github.com/meteor/react-packages/issues/25#issuecomm...