Files
ray-serve/ray_serve/serve_llm.py
Billy D. 96f7650b23
All checks were successful
Build and Publish ray-serve-apps / build-and-publish (push) Successful in 14s
fix: respect VLLM_USE_TRITON_AWQ from runtime_env instead of hardcoding 0
The previous code unconditionally set VLLM_USE_TRITON_AWQ=0, overriding
the value from the RayService runtime_env env_vars.  On gfx1151:
- Triton AWQ kernels work (TRITON_AWQ=1)
- C++ awq_dequantize op does NOT exist (TRITON_AWQ=0 → crash)

Changed to os.environ.setdefault('VLLM_USE_TRITON_AWQ', '1') so the
operator-configured value is preserved, defaulting to Triton AWQ.
2026-02-13 07:29:57 -05:00

8.7 KiB