EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm.
FollowThis article was compiled from a speech from Qingwei Yang at the Alibaba Cloud Data Lake Technology Special Exchange Meeting on July 17, 2022.
This article was compiled from a speech from Xiong Jiashu at the Alibaba Cloud Data Lake Technology Special Exchange Meeting.
This article discusses real-time data warehouse construction and offers examples of using Flink CDC and StarRocks for real-time links and data updates.
This article describes how to use Databricks and MLflow to build a machine learning lifecycle management platform.
This part of the Databricks Data Insight Open Course article series introduces Delta Lake Basics (Open-Source Edition).
This part of the Databricks Data Insight Open Course article series introduces Delta Lake Basics (Commercial Edition).
This article discusses using Delta Lake to build a batch-stream unified data warehouse and putting it into practice.
This part of the Databricks Data Insight Open Course article series discusses the evolution history of Delta Lake and its current situation.
This article explores Delta Lake and discusses the implementation of two solutions related to traditional data warehouses based on Hive tables.
This article introduces the latest two important features of RSS: support for Adaptive Query Execution (AQE) and throttling.
This article focuses on the technology, performance, and future planning of StarRocks' blazing-fast data lake analytics.
This article describes how to optimize the performance of the product features provided by the Enterprise Edition to help you efficiently access lake houses.
This article aims to solve the performance problems of offline data warehouses (daily and hourly) during production and usage.
This article reveals the key technologies of the data lake analytics engine in detail and uses StarRocks to help users understand the architecture of the system.
This article shares the application practice of Weimiao based on the big data ecosystem of Alibaba Cloud.
This article analyzes the source code of the open-source version of ClickHouse v21.8.10.19-lts.
This article shares the best practices of InMobi based on the open-source big data service of Alibaba Cloud.
This article describes the solution of an open-source real-time data warehouse based on EMR OLAP.
This article is an overview of the best practices for Flink on Zeppelin stream computing processing taken from a recent lecture.
This article is an overview of the best practices for big data processing in Spark taken from a lecture.
5260485642767126 Commented on Using Data Preorganization for Faster Queries in Spark on EMR