All checks were successful
Build and Publish ray-serve-apps / build-and-publish (push) Successful in 12s
- Add Llama 3 stop token IDs (128001, 128009) to SamplingParams as safety net for V1 engine max_tokens bug on ROCm/gfx1151 - Clamp max_tokens to min(requested, max_model_len) - Support DEFAULT_MAX_TOKENS env var (default 256) - Prevents runaway generation when V1 engine ignores max_tokens
9.3 KiB
9.3 KiB