More

ahmadyan · 2026-03-25T16:52:34 1774457554

well, you can view it Iranian are willing to insure the vessel for $2M fee - that it will not get hit by them during the crossing ;). Once they are in the Oman sea, they can use traditional insurance.

credit_guy · 2026-03-25T17:22:44 1774459364

You can view it like that, but most people don't. At least the people involved manning those tankers don't.

And why should them? It appears that the Iranian armed forces started acted quite autonomously, by design. They know that communications are not secure, so local commanders have a very high latitude in what actions they deem correct to take. If such a commander deems that asking and collecting $2 MM per vessel is a good idea, they'll do it. But if another commander thinks that sinking a passing vessel is what their standing orders are, they'll do it too, not being aware that the toll was paid. So, if you are the captain of such a vessel, what do you do? Do you complain to Iran for not holding their end of the bargain?

genxy · 2026-03-26T06:39:42 1774507182

They are all in a whatsapp or telegram together.

yogthos · 2026-03-26T00:13:42 1774484022

I mean ships are going through right now, so clearly at least some people do view it like that.

mememememememo · 2026-03-26T07:37:24 1774510644

People work on dangerous fishing trawlers because of the $. People can be found who risk their life for money.

yogthos · 2026-03-26T14:27:46 1774535266

Right, clearly you can always find people to ship oil through the strait. So the whole notion that nobody will use it because it's dangerous is nonsense.

poisonarena · 2026-03-26T00:30:39 1774485039

this is not how the maritime industry works in any way.

ahmadyan · 2026-03-19T20:31:46 1773952306

because he is giving them at 90% discount in their subscription. they are more than happy if you use the tokens at api pricing, but when subsidized they want you to use their claude code surface.

ahmadyan · 2026-03-19T20:29:55 1773952195

opencode is a very meh agent.

Source: i run pretty much all of these agents (codex, cc, droid, opencode, amp, etc) side-by-side in agentastic.dev and opencode had basically 0 win-rate over other agents.

green7ea · 2026-03-19T20:34:55 1773952495

I've been using opencode and would be curious to try something else. What would recommend for self hosted llms?

truncate · 2026-03-19T20:44:04 1773953044

Very new to self-hosted LLM, but I was able to run Codex with my local ollama server. (codex --oss)

ahmadyan · 2026-03-04T19:40:30 1772653230

source?

WarmWash · 2026-03-04T19:54:34 1772654074

There is no source. But the party in China does have ultimate control.

There would never be an Anthropic/Pentagon situation in China, because in China there isn't actually separation between the military and any given AI company. The party is fully in control.

ahmadyan · 2026-02-24T20:07:08 1771963628

Congrats on the launch

ahmadyan · 2026-02-03T16:33:08 1770136388

good point, they are standards, by definition society forced vendors to behave and play nice together. LLMs are not standards yet, and it is just pure bliss that english works fine across different LLMs for now. Some labs are trying to push their own format and stop it. Specially around reasoning traces, e.g. codex removing reasoning traces between calls and gemini requiring reasoning history. So don't take this for granted.

crazygringo · 2026-02-03T17:22:37 1770139357

I dunno. Text is a pretty good de facto standard. And they work in lots of languages, not just English.

ahmadyan · 2026-01-27T19:46:11 1769543171

Claims in the article are incorrect. They conveniently ignore Meta CWM models, which are open-sourced [1] and open-weight [2] and are at 65% SWE-bench verified (with TTS) and 54% pass@1 and the same size (32B dense). So claims like "surpassing prior open-source state-of-the-art coding models of comparable sizes and context lengths" and conveniently leaving out the previous OSS SOTA out of your eval tables are ... sketch.

[1]https://github.com/facebookresearch/cwm [2]https://huggingface.co/facebook/cwm

ethan_l_shen · 2026-01-27T20:10:08 1769544608

Hey! These are great observations. So first, while TTS can improve performance, we wanted to evaluate the raw capability of our model. This meant generating only one rollout per evaluation instance, which follows other papers in the space like SWE-smith and BugPilot. In addition, TTS adds extra inference cost and is reliant on how rollouts are ranked, two confounding factors for deployable models where memory and inference speed are extremely important.

Following that line of reasoning, context length is another very large confounding factor. Longer context lengths improve performance - but also result in enormous increases in KV cache size and memory requirements. We decide to control for this in our paper and focus at the 32K context length for 32B size models, a context length that already pushes the bounds of what can be "deployable" locally.

Still, we evaluate at 64K context length using YARN and are able to outperform CWM's 54% performance (non TTS), which it achieves using 128K context, a substantial increase over what we use. This is also pretty significant because we only ever train at 32K context, but CWM trains for a full 128K.

philipkglass · 2026-01-27T19:49:35 1769543375

The difference is that the Allen Institute models have open training data, not just open code and weights. Meta doesn't share the training data you would need to reproduce their final models. For many uses open-weight models are nearly as good, but for advancing research it's much better to have everything in the open.

kevmo314 · 2026-01-27T20:24:47 1769545487

Reading their paper, it wasn't trained from scratch, it's a fine tune of a Qwen3-32B model. I think this approach is correct, but it does mean that only a subset of the training data is really open.

mhitza · 2026-01-27T20:03:18 1769544198

The linked open weight disallows commercial, and is only licensed for research purpose

ahmadyan · 2026-01-26T18:53:33 1769453613

> 2. What on earth is this defense of their product?

i think the distribution channel is the only defensive moat in low-to-mid-complexity fast-to-implement features like code-review agents. So in case of linear and cursor-bugbot it make a lot of sense. I wonder when Github/Gitlab/Atlassian or Xcode will release their own review agent.

ahmadyan · 2026-01-26T18:47:36 1769453256

Problem with Code Review is it is quite straightforward to just prompt it, and the frontier models, whether Opus or GPT5.2Codex do a great job at code-reviews. I don't need second subscription or API call when the first one i already have and focus on integration works well out of the box.

In our case, agentastic.dev, we just baked the code-review right into our IDE. It just packages the diff for the agent, with some prompt, and sends it out to different agent choice (whether claude, codex) in parallel. The reason our users like it so much is because they don't need to pay extra for code-review anymore. Hard to beat free add-on, and cherry on top is you don't need to read a freaking poems.

zurfer · 2026-01-27T08:21:17 1769502077

we use codex review. it's working really well for us. but i don't agree that it's straightforward. moving the number of bugs catched and signal to noise ratio a few percentage points is a compounding advantage.

it's a valuable problem to solve, amplified by the fact that ai coding produces much more code.

that being said, i think it's damn hard to compete with openai or anthropic directly on a core product offering in the long run. they know that it's an important problem and will invest accordingly.

ahmadyan · 2026-01-26T05:54:25 1769406865

This is an awesome library, thank you so much for making it. I just ported it to my IDE, agentastic.dev, and works like a charm.

https://assets.agentastic.ai/agentastic-dev-assets/videos/0....