More

nycdatasci · 2026-02-27T15:20:50 1772205650

The idea of stateful models/interactions in an enterprise is extremely powerful. Is anyone aware of open source projects that have a similar goal? I'm looking for stateful conversations, with collaborative agent/skill refinement.

nycdatasci · 2026-02-12T03:12:10 1770865930

To head off the semantics debate: I don't mean a model rewriting its own source code. I'm asking about 'process recursion'—systems that analyze completed work to autonomously generate new agents or heuristics for future tasks.

jvalencia · 2026-02-12T03:21:06 1770866466

-ish. I often keep md files around and after a successful task. I ask Codex to write the important bits down. Then, when I come around to a similar task in the future, I have it start at the md file. It's like context that grows and is very localized. It helps when I'm going through multiple repos at multiple levels.

marc_g · 2026-02-12T06:39:08 1770878348

I’m also doing similar with fairly decent results. AGENTS.md grows after each session that resulted in worthwhile knowledge that future sessions can take advantage of. At some point I assume it will be too big, then it’s back to the Stone Age for the new agents, in order to release some context for the actual work.

nycdatasci · 2026-02-10T15:20:49 1770736849

"We demonstrate that our IsoDDE more than doubles the accuracy of AlphaFold 3 on a challenging protein-ligand structure prediction generalisation benchmark, predicts small molecule binding-affinities with accuracies that exceed gold-standard physics-based methods at a fraction of the time and cost, and is able to accurately identify novel binding pockets on target proteins using only the amino acid sequence as input."

It seems like a key challenge here is not just creating a protein that will bind to a specific site, but also ensuring that off-target binding won't happen. Is this feasible? I'm not familiar with this space, but RefSeq [1] shows 442M proteins and the human protein atlas seems to only cover 17.4k [2]. Do we have comprehensive knowledge of human proteins that would allow us to identify off-site affinities?

[1] https://www.ncbi.nlm.nih.gov/refseq/

[2] https://www.proteinatlas.org/

nycdatasci · 2026-02-08T14:34:37 1770561277

Since many posts mention lack of substance, providing a link to the All-In Podcast from last week in which they discuss Clawdbot (prior to re-brand). https://www.youtube.com/watch?v=gXY1kx7zlkk&t=2754s

For the impatient, here's a transcript summary (from Gemini):

  The speaker describes creating a "virtual employee" (dubbed a "replicant") running on a local server with unrestricted, authenticated access to a real productivity stack—including Gmail, Notion, Slack, and WhatsApp. Tasked with podcast production, the agent autonomously researched guests, "vibe coded" its own custom CRM to manage data, sent email invitations, and maintained a work log on a shared calendar. The experiment highlights the agent's ability to build its own internal tools to solve problems and interact with humans via email and LinkedIn without being detected as AI.

He ultimately concludes that for some roles, OpenClaw can do 90%+ of the work autonomously. Jason controversially mentions buying Macs to run Kimi 2.5 locally so they can save on costs. Others argue that hosting an open model on inference optimized hardware in the cloud is a better option, but doing so requires sharing potentially sensitive data.

tantalor · 2026-02-08T14:41:15 1770561675

There is a reason I stopped listening to All-In Podcast.

berkanunal · 2026-02-09T11:01:23 1770634883

> The investor Jason Calacanis stayed in touch with Mr. Epstein after his 2008 conviction and three years later helped the financier contact a pair of Bitcoin developers, according to emails included in the documents.

Did Jason ever mentioned this in the episode, can you ask gemini?

rsynnott · 2026-02-09T00:33:45 1770597225

I mean... If Jason Calacanis told me the sky was blue, I would be _checking_.

(At some point he seems to have gone from professionally-wrong-about-everything blogger to magical-podcast-thought-leader. I have no idea how this happened.)

nycdatasci · 2026-02-02T19:02:54 1770058974

The landing page for the demo game "Voxel Velocity" mentions "<Enter> start" at the bottom, but <Enter> actually changes selection. One would think that after 7mm tokens and use of a QA agent, they would catch something like this.

anematode · 2026-02-02T20:24:27 1770063867

It's interesting, isn't it? On the one hand the game is quite impressive. Although it doesn't have anything particularly novel (and it shouldn't, given the prompt), it still would have taken me several days, probably a week, working nonstop. On the other hand, there's plenty of paper cuts.

I think these subtle issues are just harder to provide a "harness" for, like a compiler or rigorous test suite that lets the LLM converge toward a good (if sometimes inelegant) solution. Probably a finer-tuned QA agent would have changed the final result.

why_at · 2026-02-02T21:33:38 1770068018

It's also interesting how the functionality of the game barely changes between 60k tokens, 800k tokens, and 7MM tokens. It seems like the additional tokens made the game look more finished, but it plays almost exactly the same in all of them.

I wonder what it was doing with all those tokens?

zamadatix · 2026-02-03T11:58:25 1770119905

I'd bet the initial token usage is all net new while the later token usage probably has reading+regenerating significant portions of the project for individual minor changes/fixes.

E.g. I wouldn't be surprised if identifying the lack of touch screen support on the menu, feeding it in, and then regenerating the menu code sometime between 800k and 7MM took a lot of tokens.

mazswojejzony · 2026-02-03T13:56:16 1770126976

Sadly, my own small game-dev adventures look similar: I can implement the core mechanics fairly quickly, but polishing the game takes ages.

UPDATE: without AI usage at all (just to clarify).

nycdatasci · 2026-01-16T17:03:43 1768583023

I don't think this is a relevant comparison. Snap guns break pins, which isn't the case with this robot.

avidiax · 2026-01-16T20:13:32 1768594412

> Snap guns break pins

They are a kinetic attack on the pins, but they don't shear or shatter them.

nycdatasci · 2026-01-15T02:58:00 1768445880

Is this from 2024? It mentions "With global data center demand at 60 GW in 2024"

Also, there is no mention of the latest-gen NVDA chips: 5 RNGD servers generate tokens at 3.5x the rate of a single H100 SXM at 15 kW. This is reduced to 1.5x if you instead use 3 H100 PCIe servers as the benchmark.

nycdatasci · 2026-01-07T20:40:02 1767818402

Work around from comments:

  rm -rf ~/.claude/cache
  mkdir -p ~/.claude/cache
  echo "# Changelog" > ~/.claude/cache/changelog.md
  chmod 444 ~/.claude/cache/changelog.md

nycdatasci · 2026-01-07T18:25:40 1767810340

Have you experimented with all of these things on the latest models (e.g. Opus 4.5) since Nov 2025? They are significantly better at coding than earlier models.

klaussilveira · 2026-01-08T01:19:34 1767835174

Yes, December 2025 and January 2026.

nycdatasci · 2025-12-17T02:36:06 1765938966

The arena concept doesn’t work for image models due to watermarks.

encroach · 2025-12-17T02:39:22 1765939162

There are no watermarks in the arena.

nycdatasci · 2025-12-17T12:04:35 1765973075

There are no visible watermarks, but model makers can use steganographic codes to identify outputs from their own models.

nycdatasci · 2025-12-17T16:58:50 1765990730

Text-to-Image Models Leave Identifiable Signatures: Implications for Leaderboard Security

https://arxiv.org/pdf/2510.06525

encroach · 2025-12-17T19:08:10 1765998490

This is true, however LMArena does employ some methods to mitigate attempts to manipulate the leaderboard, see https://openreview.net/forum?id=zf9zwCRKyP

They also control for style https://news.lmarena.ai/sentiment-control/