I've been using otvdm (aka winevdm) and it is very good. It works like the original ntvdm did when running on non-x86 platforms. It emulates the processor and translates 16-bit api calls into 32-bit api calls so things work as you'd expect.
His essays are, themselves, "cheating" (in the sense of a life hack). Say what people want to hear today, even if it contradicts what you said yesterday.
I suspect it's not possible (as an end user) to get a thinking trace from one of the models. But what happens with "thinking" is that the model has a conversation with itself in an attempt to home in on a better answer to the original prompt.
The "amount of thinking" is how long this internal conversation is allowed to progress. The longer it goes on the more it costs. It's all part of the token budget but, because this internal dialogue is hidden, it's not obvious to the end user.
> I suspect it's not possible (as an end user) to get a thinking trace from one of the models. But what happens with "thinking" is that the model has a conversation with itself in an attempt to home in on a better answer to the original prompt.
The model that summarizes what is inside the CoT/|thinking| tags is just an LLM, and it's just as jailbreakable/susceptible to prompt injection as any other LLM: https://x.com/lefthanddraft/status/1991076879877460322 (for those without X; that's Wyatt Walls demonstrating both getting the gemini summarizer to print the raw CoT, as well as just do random calculations, dump its system prompt, etc.)
It feels like Greek mythology should have some metaphor for "apparently simple structure that is so complex it leads anybody that studies it into madness". But I can't think of any name to put there.
reply