×
Model Inference

Best Practices for AI Model Inference Configuration in Knative

The article introduces best practices for deploying and configuring AI model inference in Knative, focusing on the optimization of GPU resource utilization and rapid scaling.

Best Practices for Large Model Inference in ACK: TensorRT-LLM

This article uses the Llama-2-7b-hf model as an example to demonstrate how to deploy the Triton framework using KServe in Alibaba Cloud ACK.

ACK Cloud Native AI Suite | Training and Inference of Open-Source Large Models on Kubernetes

The sixth episode of ACK Cloud Native AI Suite series introduces how to train and infer open-source foundation models based on the ACK Cloud-Native AI suite.

ACK Cloud Native AI Suite | Elastic Acceleration of Generative AI Model Inference with Fluid

The fourth episode of ACK Cloud Native AI Suite series introduces Fluid, the data orchestration acceleration engine in the ACK Cloud-Native AI suite.

Deploy a RAG-Based LLM Chatbot in EAS

This article describes how to deploy a RAG-based LLM chatbot and how to perform model inference.

E2E Development and Usage of LLM Data Processing + Model Training + Model Inference

This article describes how to use the data processing, model training, and model inference components of Large Language Model (LLM) provided by PAI to complete end-to-end development and use of LLM.

Deploy Stable Diffusion API Service in EAS

This article describes how to use Elastic Algorithm Service (EAS) of Platform for AI (AI) to deploy the Stable Diffusion (SD) API service and how to use SD APIs for model inference.