Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
Ey7NFZ3P0nzAe
38 days ago
|
parent
|
context
|
favorite
| on:
Qwen3.6-Max-Preview: Smarter, Sharper, Still Evolv...
In my case, I was also running an ASR model and a TTS model so it was a bit much for my RTX 3090. I opted to offset like 5 layers to the cpu while adding a GPU-only speculative decoding with their 0.8B model.
Working well so far.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search:
Working well so far.