Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A) You could have an additional field in the jsonl file which says which rubric to use; then, your reward function could access this via `kwargs["rubric"]` and return a reward based on that example's preferred rubric;

B) currently, pricing on the deployed API is free, but the startup time is a few minutes and it's run on a small GPU node and is therefore not awfully fast. If you would like more production-level inference, email us at [email protected] and we could set you up with something much faster (where we'd charge per token depending on model size)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: