Here's a more concrete example where GPT-OSS 20B performed very well IMHO. I tes...

Mkengin · 2025-08-08T08:22:51 1754641371

Why Qwen2.5 and not Qwen3-30B-A3B-Thinking-2507 or Qwen3-Coder-30B-A3B-Instruct?

magicalhippo · 2025-08-08T09:46:26 1754646386

Mostly because I had it downloaded already and I'm mostly interested in models that fit on my 16GB GPU. But since you asked, I ran the same questions through both 30B models in the q4_k_m variant, as GPT-OSS 20B is also quantized to about q4.

First the ill-posed question:

Qwen 3 Coder gave very similar answer to Phi 4, though included a more long-winded explanation in the comments. So not bad, but not great either.

Qwen 3 Thinking thought for a good minute before deciding the question was ill-posed and return the hash marks. However the following explanation was not as good as GPT-OSS, IMHO: The question is unclear because an LC circuit (without resistance) does not have a "cutoff frequency"; cutoff frequency applies to filter circuits like RC or RLC. Additionally, the inductance (L) value is missing for calculating resonant frequency in an RLC circuit. The given R and C values are insufficient without L.

Sure, an unloaded LC filter doesn't have a cutoff frequency, but in all normal cases the load is implied[1] and so the LC filter does have a cutoff frequency. So more thinking to get to a worse answer.

The SQL question:

Qwen 3 Coder did identify the same pitfall as GPT-OSS, however didn't flag it as clearly as GPT-OSS, mostly because it also flagged some unnecessary stuff so got drowned. It did make the same assumption about evenly dividing, and overall the answer was about as good. However the speed on my computer was roughly half the number of tokens per second as GPT-OSS, at just ~9 tokens/second.

Qwen 3 Thinking thought for 3 minutes, yet managed to miss the key aspect, thus giving everyone the pizza. And it did so at the same slow pace as Qwen 3 Coder.

The SQL question requires a somewhat large context due to the large table definitions, and being a larger model it required pushing more layers to the CPU, which I assume is the major factor in the speed drop.

So overall Qwen 3 Coder was a solid contender, but on my PC much slower. If it could run entirely on GPU I'd certainly try it a lot more. Interestingly Qwen 3 Thinking was just plain worse. Perhaps not tuned to other tasks besides coding?

[1]: https://www.ti.com/lit/an/slaa701a/slaa701a.pdf section 3.3 page 9

[2]: https://github.com/ollama/ollama/issues/11772

Mkengin · 2025-08-08T10:47:08 1754650028

Thank you for testing, I will test GPT-OSS for my use case as well. If you're interested I have 8 GB VRAM, 32 GB RAM and get around 21 token/s with tensor offloading, I would assume that your setup should be even faster than mine with the optimizations. I use the IQ4_KSS quant (by ubergarm on hf) with ik_llama.cpp with this command:

$env:LLAMA_SET_ROWS = "1"; ./llama-server -c 140000 -m D:\ik_llama.cpp\build\bin\Release\models\Qwen3-Coder-30B-A3B-Instruct-IQ4_KSS.gguf -ngl 999 --flash-attn -ctk q8_0 -ctv q8_0 -ot "blk\.(19|2[0-9]|3[0-9]|4[0-7])\.ffn_.*_exps\.=CPU" --temp 0.7 --top-p 0.8 --top-k 20 --repeat_penalty 1.05 --threads 8

In my case I offload layers 19-47, maybe you would just have to offload 37-47, so "blk\.(3[7-9]|4[0-7])\.ffn_.*_exps\.=CPU"

magicalhippo · 2025-08-08T11:33:21 1754652801

Yeah I think I could get better performance out of both by tweaking, but so far the ease of use has triumphed so far.

iamnotagenius · 2025-08-08T08:19:59 1754641199

glm 4 (1 sec):

To determine the cutoff frequency (fc ) for an RC circuit (since you've provided resistance R and capacitance C, but not inductance L), we can use the following formula:

[.... calculation]

So, the cutoff frequency is approximately 31.83 kHz.

Note:

If you intended to ask about an RLC circuit (with both R, L, and C), please provide the inductance L value, and I can calculate the cutoff frequency for that case as well. The formula would then involve both L and C.