ray-serve/ray_serve/serve_llm.py at 96f7650b231f044287cb56723857894f3c96d96d

Files

Build and Publish ray-serve-apps / build-and-publish (push) Successful in 14s

Details

fix: respect VLLM_USE_TRITON_AWQ from runtime_env instead of hardcoding 0

The previous code unconditionally set VLLM_USE_TRITON_AWQ=0, overriding
the value from the RayService runtime_env env_vars.  On gfx1151:
- Triton AWQ kernels work (TRITON_AWQ=1)
- C++ awq_dequantize op does NOT exist (TRITON_AWQ=0 → crash)

Changed to os.environ.setdefault('VLLM_USE_TRITON_AWQ', '1') so the
operator-configured value is preserved, defaulting to Triton AWQ.

2026-02-13 07:29:57 -05:00

8.7 KiB

Raw Blame History

View Raw

8.7 KiB Raw Blame History

8.7 KiB

Raw Blame History