×
E-MapReduce

The Principles of EMR StarRocks' Blazing-Fast Data Lake Analytics

This article focuses on the technology, performance, and future planning of StarRocks' blazing-fast data lake analytics.

The Spark and Delta Lake Engine Enterprise Edition of Databricks Helps Efficiently Access Lake Houses

This article describes how to optimize the performance of the product features provided by the Enterprise Edition to help you efficiently access lake houses.

Zuoyebang's Best Practices for Building Data Lakes Based on Delta Lake

This article aims to solve the performance problems of offline data warehouses (daily and hourly) during production and usage.

How to Build a Blazing-Fast Data Lake Analytics Engine

This article reveals the key technologies of the data lake analytics engine in detail and uses StarRocks to help users understand the architecture of the system.

How to Build a Cloud-Native Open-Source Big Data Platform | The Application Practice of Weimiao

This article shares the application practice of Weimiao based on the big data ecosystem of Alibaba Cloud.

Setup EMR Yarn authentication using Active Directory with Apache Knox

A guide to configure integration between Alibaba Cloud EMR with Active Directory.

Alibaba Cloud E-MapReduce vs AWS EMR vs. Azure HDInsight

Big Data is among the biggest IT trends of the last years. Maintaining a large infrastructure for analytics is a major challenge for Big Data.

Drilling into Big Data-Data preparation (5)

In this article, we will discuss about Spark for big data and show you how to set it up on Alibaba Cloud.

Alibaba Big Data Practices on Cloud-Native – EMR Spark on ACK

This article discusses the practices and challenges of EMR Spark on Alibaba Cloud Kubernetes.

Application of Delta Lake in Soul

This article explains the background of Delta Lake along with practices, problems, and solutions.

Alibaba Cloud JindoFS Handles Stress Testing Easily with More Than One Billion Files

This article reviews JindoFS stress testing, featuring multiple scenarios and graphs.

Fluid with JindoFS: An Acceleration Tool for Alibaba Cloud OSS

This article introduces Fluid, an open source Kubernetes-native distributed dataset orchestrator and accelerator for data-intensive applications, and talks about the advantages of JindoRuntime.

An Overview of Alibaba Cloud's Comprehensive Cloud-Native Data Lake System

This article introduces the establishment of a cloud-native data lake system based on Alibaba Cloud OSS, Data Lake Formation (DLF), and various computing engines present in Alibaba Cloud.

How Delta Lake and DLF Service Facilitate Real-time CDC Synchronization in a Data Lake

This article explains how to perform real-time CDC synchronization in a data lake using Alibaba Cloud's Data Lake Formation (DLF) service.

How to Use JindoDistCp for Offline Data Migration to a Data Lake

This article discusses the data lake offline data migration process using JindoDistCp and explains how it improves the migration performance in different scenarios.

JindoTable for Data Optimization and Query Acceleration in a Data Lake

The article briefly discusses Alibaba Cloud's JindoTable and explains how it solves the data management problems in a data lake.

Cloud-Native Compute Engine: Challenges and Solutions

This article explains some of the challenges in cloud-native compute engines, and discusses some solutions and future directions.

Data Lake: How to Explore the Value of Data Using Multi-engine Integration

This article briefly discusses the metadata service and multi-engine support capabilities of the Alibaba Cloud Data Lake Formation (DLF) service.

EMR Remote Shuffle Service: A Powerful Elastic Tool of Serverless Spark

This article discusses Alibaba Cloud's EMR Remote Shuffle Service and explains how it solves the shuffle stability problems in compute-storage separation architectures.

Implementation and Challenges of Data Lake Metadata Services

This article explains the benefits, architecture, and implementation challenges of data lake metadata services.