unfortunately disabling temperature / switching to greedy sampling doesn't necessarily make most LLM inference engines _fully_ deterministic as parallelism and batching can result in floating point error accumulating differently from run to run - it's possible to make them deterministic but does come with a perf hit
some providers _do_ let you set the temperature, including to "zero", but most will not take the perf hit to offer true determinism
some providers _do_ let you set the temperature, including to "zero", but most will not take the perf hit to offer true determinism