Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

its so long, so much waste of compute during inference. Wondering why they couldn't finetune it through some instructions.


Fine-tuning is expensive and slow compared to prompt engineering, for making changes to a production system.

You can develop validate and push a new prompt in hours.


You need to include the prompt in every query, which makes it very expensive


The prompt is kv-cached, it's precomputed.


Good point, but it still increases the compute of all subsequent tokens


They're most likely using prefix caching so it doesn't materially change the inference time


has anything been done to like turn common phrases into a single token?

like "can you please" maps to 3895 instead of something like "10 245 87 941"

Or does it not matter since tokenization is already a kind of compression?


You can try cyp but ymmv


I imagine the tone you set at the start affects the tone of responses, as it makes completions in that same tone more likely.

I would very much like to see my assumption checked — if you are as terse as possible in your system prompt, would it turn into a drill sergeant or an introvert?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: