×
KServe

Building a Large Language Model Inference Service Optimized by TensorRT-LLM Based on KServe on ASM

This article introduces how to deploy optimized LLM model inference services in a cloud-native environment using the TensorRT-LLM-optimized Llama-2-hf model as an example.

Best Practices for Large Model Inference in ACK: TensorRT-LLM

This article uses the Llama-2-7b-hf model as an example to demonstrate how to deploy the Triton framework using KServe in Alibaba Cloud ACK.

Model Service Mesh: Model Service Management in Cloud-native Scenario

This article introduces Model Service Mesh, an architectural pattern for deploying and managing scalable machine learning model services in a distributed environment.

KServe + Fluid Accelerates Big Model Inference

This article explores how to implement the KServe big model inference in Alibaba Cloud Container Service for Kubernetes (ACK).

How to Quickly Deploy AI Inference Services Based on ACK Serverless

This article describes how to quickly deploy AI inference services based on ACK Serverless.

Istio Ecosystem on ASM (3): Integrate KServe into Alibaba Cloud Service Mesh

Part 3 of this 3-part series discusses how to use Alibaba Cloud Service Mesh (ASM) and Alibaba Cloud Container Service for Kubernetes (ACK) for deployment.

The Definition of the New Service Mesh-Driven Scenario: AI Model Services - Model Mesh

This article describes how to use Alibaba Cloud Service Mesh (ASM) and Alibaba Cloud Container Service for Kubernetes (ACK) for deployment.