Community

Blog Events Webinars Tutorials Forum

Create Account

×

vLLM

ACK Gateway with Inference Extension: A Practice for Optimizing Large Model Inference Service Deployed across Multiple Nodes

This article introduces how to use ACK Gateway with Inference Extension to optimize multi-node large-model inference performance.

Alibaba Container Service July 25, 2025 656

ACK One Registered Clusters Help Solve GPU Resource Shortage in Data Centers

With the help of ACK One registered clusters, we can make full use of ACS GPU computing power of Alibaba Cloud to efficiently deploy the DeepSeek inference model.

Alibaba Container Service May 19, 2025 823

Analyzing the Distributed Inference Process Using vLLM and Ray from the Perspective of Source Code

This article explores how to implement distributed inference with vLLM and Ray from a source code perspective.

Alibaba Container Service July 24, 2024 15,241

Related Tags

artificial intelligence big data cloud computing