This article offers a framework for choosing between self-hosted GPUs and MaaS for LLM inference by weighing cost, data, engineering, and scalability tradeoffs.