EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm.
FollowThis article focuses on the technology, performance, and future planning of StarRocks' blazing-fast data lake analytics.
This article describes how to optimize the performance of the product features provided by the Enterprise Edition to help you efficiently access lake houses.
This article aims to solve the performance problems of offline data warehouses (daily and hourly) during production and usage.
This article reveals the key technologies of the data lake analytics engine in detail and uses StarRocks to help users understand the architecture of the system.
This article shares the application practice of Weimiao based on the big data ecosystem of Alibaba Cloud.
This article analyzes the source code of the open-source version of ClickHouse v21.8.10.19-lts.
This article shares the best practices of InMobi based on the open-source big data service of Alibaba Cloud.
This article describes the solution of an open-source real-time data warehouse based on EMR OLAP.
This article is an overview of the best practices for Flink on Zeppelin stream computing processing taken from a recent lecture.
This article is an overview of the best practices for big data processing in Spark taken from a lecture.
This article aims to give readers a deeper understanding of Alibaba Cloud Data Lake Formation (DLF) and Databricks DataInsight (DDI).
This article discusses the practices and challenges of EMR Spark on Alibaba Cloud Kubernetes.
This article explains the background of Delta Lake along with practices, problems, and solutions.
This article reviews JindoFS stress testing, featuring multiple scenarios and graphs.
This article introduces the best practices and cases for building, analyzing, developing, and governing cloud-native data lakes.
This article introduces the exploration and practice of Mobvista in the field of cloud-native data lakes, as well as the architecture of StarLake.
This article introduces the establishment of a cloud-native data lake system based on Alibaba Cloud OSS, Data Lake Formation (DLF), and various computing engines present in Alibaba Cloud.
This article explains how to perform real-time CDC synchronization in a data lake using Alibaba Cloud's Data Lake Formation (DLF) service.
This article discusses the data lake offline data migration process using JindoDistCp and explains how it improves the migration performance in different scenarios.
The article briefly discusses Alibaba Cloud's JindoTable and explains how it solves the data management problems in a data lake.