This article introduces how to deploy optimized LLM model inference services in a cloud-native environment using the TensorRT-LLM-optimized Llama-2-hf model as an example.
The article introduces best practices for deploying and configuring AI model inference in Knative, focusing on the optimization of GPU resource utilization and rapid scaling.
This step-by-step tutorial introduces how to deploy an enterprise-class elastic stable diffusion service in ASK.
This article describes how to use preemptible instances in Knative.
This article introduces how to use ACK One and Knative to manage cloud resources.
This article describes Knative's traffic management, traffic access, traffic-based elasticity, and monitoring.
This article gives an in-depth understanding of the elastic implementation of Knative.
This article describes the implementation of the network layer capabilities of Knative.
This article describes how to deploy enterprise-level AI applications based on Alibaba Cloud Serverless Container Service.
This article describes how to quickly deploy AI inference services based on ACK Serverless.
Part 6 of this 6-part series describes how to enable auto scaling of pods based on the number of requests
Part 5 of this 6-part series describes how to implement canary deployment of services based on traffic in Knative on ASM.
Part 4 of this 6-part series demonstrates how to use the ASM gateway to access Knative services over HTTPS.
Part 3 of this 6-part series describes how to set a custom domain name for Knative Serving.
Part 2 of this 6-part series describes how to use Knative on ASM to create Knative services.
Abstract: Part 1 of this 6-part series introduces Knative on ASM.
This article discusses the challenges faced by the AI Generative Content (AIGC) project Stable Diffusion in terms of limited processing capacity and precious GPU resources.
This article describes how to use EventBridge's event to trigger a Knative service, using the example event of uploading files to Object Storage Service (OSS).
This article introduces the WAGI project and explains how it can combine WASM and WASI applications with Serverless frameworks.
This article describes how to integrate ALB in Alibaba Cloud Container Service Knative.