×
LLMs

Alibaba Cloud Drives a More Sustainable, Efficient and Intelligent Olympic Experience at Milano Cortina 2026

Alibaba Group has supported the Olympic and Paralympic Winter Games Milano Cortina 2026 (Milano Cortina 2026) in becoming the most intelligent Games in Olympic history.

Caching is Efficiency: Achieving Precise LLM Cache Hits with Alibaba Cloud ACK GIE

This article introduces ACK GIE's precision-mode prefix cache-aware routing that maximizes KV-Cache hit rates for distributed LLM inference.

ACK One Fleet Multi-Cluster Canary Release: A "Safety Valve" for AI Inference Services

This article introduces ACK One Fleet's multi-cluster canary release solution, integrated with Kruise Rollout, for safe AI inference deployments across hybrid and geo-distributed clouds.

When Agents Meet Workflows—Can Intelligence Become More Controllable?

This article introduces how combining LLM Agents with deterministic Workflows like Argo enables controllable, production-ready AI systems.

Qwen3.5: Towards Native Multimodal Agents

We are delighted to announce the official release of Qwen3.5, introducing the open-weight of the first model in the Qwen3.

Qwen App's CNY Campaign Attracts Over 120 Million Orders

Qwen App, Alibaba’s consumer-facing AI application, has spurred a behavioral shift toward AI-powered shopping during its Chinese New Year (CNY) campaign.

UModel Data Governance: Practice of Building an O&M World Model

This article introduces UModel, Alibaba Cloud's ontology that transforms observability into a unified model-driven digital twin of IT systems.

Joe Tsai on the Future of Open-Source AI: Why Full-Stack Companies Will Excel

Alibaba Chairman shares his perspective at the World Government Summit 2026 on why full stack companies maintains an advantage as open-source AI providers.

Alibaba Brings Cloud-Based AI Innovation to the Olympic Winter Games Milano Cortina 2026

Alibaba Cloud is partnering with OBS and IOC to deploy advanced cloud and AI technologies for the Olympic and Paralympic Winter Games Milano Cortina 2026.

Hybrid Model Support | SGLang's Support Scheme for Hybrid Architecture Models like Mamba-Transformer

This article introduces a dual memory-pool inference framework enabling efficient hybrid Transformer-Mamba model execution by resolving conflicting caching mechanisms.

Alibaba Cloud Tair KVCache Implementation Based on 3FS Enterprise-Grade Deployment, High-Availability Operations & Performance Optimization

This article introduces engineering optimizations to 3FS—KVCache's foundation layer—across performance, productization, and cloud-native management for scalable AI inference.

Dify Officially Launched the Nacos A2A Plugin, Completing Its Bidirectional Multi-agent Collaboration Capabilities

This article introduces Dify's Nacos A2A plugins, enabling bidirectional agent collaboration—discovering external A2A agents and exposing Dify apps as discoverable agents via Nacos Registry.

Rebuild Search Pipelines: An Analysis of PolarDB IMCI Capabilities

The article introduces PolarDB IMCI’s native columnar full-text indexing for efficient, integrated text and hybrid vector search—eliminating the need for external search engines.

Momentum: How Alibaba Cloud Is Leading the New AI Paradigm

At Alibaba Cloud, we're not just delivering technology. We're co-creating a new chapter of AI with the world.

Alibaba Cloud Accelerates Global AI Partner Ecosystem with New Incentives and Investments

New programs for channel, ISV, and service partners to accelerate AI adoption, service transformation, and SMB growth

Memahami Kebutuhan GPU Memory untuk LLM: Panduan Lengkap

Apakah kamu berencana melakukan deployment Large Language Model (LLM) tapi nggak tahu berapa GPU memory yang dibutuhkan? atau model AI yang kamu gunak...

Quest 1.0: Self-learning Coding Agent

The release of Quest 1.0—an autonomous agent capable of self-learning and rapid evolution was unveiled last week.

Is Your AI Agent Getting Dumber? Alibaba Cloud AnalyticDB Unveils AI Context Engineering

This article introduces AI Context Engineering, a framework on Alibaba Cloud's AnalyticDB that prevents AI agents from "getting dumber" by intelligently managing their context and memory.

From ReAct to Ralph Loop A Continuous Iteration Paradigm for AI Agents

The article introduces the Ralph Loop—a continuous, self-iterating paradigm that keeps AI programming agents working until tasks are verifiably complete.

Container Technology Evolution for LLMs and AI Agents

The article outlines how container technology is advancing to support LLMs and AI agents across data processing, training, inference, and deployment.