This article uses EMR (Cloud Hadoop) to simulate a local Hadoop cluster accessing MaxCompute data.
Friday Q&A is back! Let's take a look at some of the many very interesting questions I was asked during Alibaba Cloud training sessions this week!
This article describes how to optimize the performance of the product features provided by the Enterprise Edition to help you efficiently access lake houses.
This article aims to solve the performance problems of offline data warehouses (daily and hourly) during production and usage.
This article is an overview of the best practices for big data processing in Spark taken from a lecture.
This article aims to give readers a deeper understanding of Alibaba Cloud Data Lake Formation (DLF) and Databricks DataInsight (DDI).
This article introduces the optimization and evolution of Flink Hudi's original mini-batch-based incremental computing model through stream computing.
This article discusses the practices and challenges of EMR Spark on Alibaba Cloud Kubernetes.
This article explains the background of Delta Lake along with practices, problems, and solutions.
This article gives step-by-step instructions about auto scaling with Fluid.
In this article, the author explains building a real-time data warehouse using Apache Flink and Apache Iceberg.
In this article, the author discusses how Apache Flink and Apache Iceberg have opened a new chapter in building a data lake architecture featuring stream-batch unification.
This article explains Apache Hudi and Apache Flink and the benefits of implementation.
This article explains some of the challenges in cloud-native compute engines, and discusses some solutions and future directions.
This article briefly discusses the metadata service and multi-engine support capabilities of the Alibaba Cloud Data Lake Formation (DLF) service.
This article discusses Alibaba Cloud's EMR Remote Shuffle Service and explains how it solves the shuffle stability problems in compute-storage separation architectures.
This article explains the process of data lake formation based on Alibaba Cloud OSS and JindoFS big data cache acceleration service.
This article discusses the integration of Saas Cloud-based Data Warehouses and Real-time Search, as shared by Meng Shuo, product manager of MaxCompute.
This tutorial provides a step-by-step tutorial on how to setup PySpark in Alibaba Cloud ECS instance which is running CentOS 7.x operating system.
In this 3-part blog series, we'll show you how to build a simple, intelligent, cloud-native feed streaming system with Apache Kafka and Spark on Alibaba Cloud.