×
E-MapReduce

Building a Streaming Lakehouse: Performance Comparison Between Paimon and Hudi

This article compares the performance of Paimon and Hudi on Alibaba Cloud EMR and explores their respective roles in building quasi-real-time data warehouses.

Observability | Key Metrics to Focus On When Using Prometheus to Monitor E-MapReduce

This article explains how to monitor big data in EMR using Prometheus Service.

Alibaba Cloud Open Source Big Data Platform | E-MapReduce

In this episode, we will introduce Alibaba Cloud Open Source Big Data Platform, Elastic MapReduce.

Running Mapreduce Workload in Alibaba Cloud EMR Cluster

In this article, we’ll explain how to run map-reduce jobs in the Alibaba Cloud EMR Cluster.

Working with E-MapReduce in Alibaba Cloud

In this article, we'll introduce how to create an Alibaba Cloud EMR cluster step by step.

Storage Policies and Read/Write Optimization in JindoFS

This article describes common problems and optimization methods of data read/write in computing-storage separation scenarios, and introduces data cache acceleration with JindoFS.

Data Lake Management and Optimization

This article was compiled from a speech from Qingwei Yang at the Alibaba Cloud Data Lake Technology Special Exchange Meeting on July 17, 2022.

Unified Metadata and Permissions for Data Lakes

This article was compiled from a speech from Xiong Jiashu at the Alibaba Cloud Data Lake Technology Special Exchange Meeting.

StarRocks x Flink CDC for End-to-End Real-Time Links

This article discusses real-time data warehouse construction and offers examples of using Flink CDC and StarRocks for real-time links and data updates.

Databricks Data Insight Open Course - Use Databricks + MLFlow to Train and Deploy Machine Learning Models

This article describes how to use Databricks and MLflow to build a machine learning lifecycle management platform.

Databricks Data Insight Open Course - An Introduction to Delta Lake (Open-Source Edition)

This part of the Databricks Data Insight Open Course article series introduces Delta Lake Basics (Open-Source Edition).

Databricks Data Insight Open Course - An Introduction to Delta Lake (Commercial Edition)

This part of the Databricks Data Insight Open Course article series introduces Delta Lake Basics (Commercial Edition).

Databricks Data Insight Open Course - How to Use Delta Lake to Build a Batch-Stream Unified Data Warehouse

This article discusses using Delta Lake to build a batch-stream unified data warehouse and putting it into practice.

Databricks Data Insight Open Course - An Evolution History and Current Situation of Delta Lake

This part of the Databricks Data Insight Open Course article series discusses the evolution history of Delta Lake and its current situation.

Practices of Simulating IDC Spark Read and Write MaxCompute

This article uses EMR (Cloud Hadoop) to simulate a local Hadoop cluster accessing MaxCompute data.

Data Lake Exploration – Delta Lake

This article explores Delta Lake and discusses the implementation of two solutions related to traditional data warehouses based on Hive tables.

New Features of Alibaba Cloud Remote Shuffle Service: AQE and Throttling

This article introduces the latest two important features of RSS: support for Adaptive Query Execution (AQE) and throttling.

The Principles of EMR StarRocks' Blazing-Fast Data Lake Analytics

This article focuses on the technology, performance, and future planning of StarRocks' blazing-fast data lake analytics.

The Spark and Delta Lake Engine Enterprise Edition of Databricks Helps Efficiently Access Lake Houses

This article describes how to optimize the performance of the product features provided by the Enterprise Edition to help you efficiently access lake houses.

Zuoyebang's Best Practices for Building Data Lakes Based on Delta Lake

This article aims to solve the performance problems of offline data warehouses (daily and hourly) during production and usage.