More

dimava · 2026-01-31T12:59:39 1769864379

Open source models costs are determined only by electricity usage, as anyone can rent a GPU qnd host them Closed source models cost x10 more just because they can A simple example is Claude Opus, which costs ~1/10 if not less in Claude Code that doesn't have that price multiplier

jgalt212 · 2026-01-31T13:50:07 1769867407

But Kimi seems so big that renting the necessary number of GPUs is a non trivial exercise.

pstuart · 2026-01-31T18:13:53 1769883233

Exactly! Electricity, hosting, and amortized cost of the GPUs would be the baseline costs.

dimava · 2025-12-23T17:23:20 1766510600

Tokens will cost same on Mac and on API because electricity is not free

And you can only generate like $20 of tokens a month

Cloud tokens made on TPU will always be cheaper and waaay faster then anything you can make at home

reissbaker · 2025-12-23T17:38:49 1766511529

This generally isn't true. Cloud vendors have to make back the cost of electricity and the cost of the GPUs. If you already bought the Mac for other purposes, also using it for LLM generation means your marginal cost is just the electricity.

Also, vendors need to make a profit! So tack a little extra on as well.

However, you're right that it will be much slower. Even just an 8xH100 can do 100+ tps for GLM-4.7 at FP8; no Mac can get anywhere close to that decode speed. And for long prompts (which are compute constrained) the difference will be even more stark.

foobar10000 · 2025-12-24T04:22:17 1766550137

A question on the 100+ tps - is this for short prompts? For large contexts that generate a chunk of tokens at context sizes at 120k+, I was seeing 30-50 - and that's with 95% KV cache hit rate. Am wondering if I'm simply doing something wrong here...

reissbaker · 2025-12-29T11:09:09 1767006549

Depends on how well the speculator predicts your prompts, assuming you're using speculative decoding — weird prompts are slower, but e.g. TypeScript code diffs should be very fast. For SGLang, you also want to use a larger chunked prefill size and larger max batch sizes for CUDA graphs than the defaults IME.

dimava · 2025-12-20T09:18:56 1766222336

When other models would grep, then read results, then use search, then read results, then read 100 lines from a file, then read results, Composer 1 is trained to grep AND search AND read in one round trip It may read 15 files, and then make small edits in all 15 files at once

dimava · 2025-12-18T13:52:44 1766065964

Just ask LLM to write one on top of OpenRouter, AI SDK and Bun To take your .md input file and save outputs as md files (or whatever you need) Take https://github.com/T3-Content/auto-draftify as example

dimava · 2025-12-17T22:37:36 1766011056

$30 in API pricing

> I was running this against my $20/month ChatGPT Plus account

dimava · 2025-11-01T16:54:31 1762016071

refined title:

ArXiv CS requires peer review for surveys amid flood of AI-written ones

- nothing happened to preprints

- "summarization" articles always required it, they are just pointing at it out loud

dimava · 2025-10-02T14:30:43 1759415443

DeepSeek on GPUs is like 5x cheaper then GPT

And TPUs are like 5x cheaper then GPUs, per token

Inference is very much profitable

9rx · 2025-10-02T14:40:13 1759416013

You can do most anything profitability if you ignore the vast majority of your input costs.

dimava · 2025-09-01T22:21:37 1756765297

Except 12:01 is in 24-hour clock which doesn't have 12:00 problem in the first place

dimava · 2025-07-29T09:04:24 1753779864

Can you make a variant for relative passing time?

You probably barely remember anything up to around 10, and then each doubling of age adds one logarithmical unit

So 10 is 1, 20 is 2, 40 is 3 and 80 is 4 (or maybe 0, 1 and 2?)

20 is already half of life passed by -_-

Chris2048 · 2025-07-29T12:52:49 1753793569

I think that's a bit too simplistic, unless someone can testify that the 20 years 20 t0 40 feel as long as 40 years, 40 to 80.

Here's an interesting graph and discussion on reddit: https://www.reddit.com/r/dataisbeautiful/comments/1e18fmz/pe...

Still looking if anyone has a study of (life/long-term) time perception w/ graph(s).

dimava · 2025-07-28T17:25:51 1753723551

Aka VSCode DevContainer?

Could work I think (be wary of sending .env to the web though)

adastra22 · 2025-07-28T23:17:15 1753744635

One way of doing it, yes. Why would your dev repo have any credentials in .env?