This article introduces the engineering challenges of generative AI model services in cloud-native scenarios and the optimization of Fluid in cloud-native generative AI model inference contexts.