×

Alibaba EMR

1737 Reputation

EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm.

Follow
Activities(31) Posts(31) Series(1) Areas of Expertise Following Followers

Alibaba EMR Posted blog

StarLake: Exploration and Practice of Mobvista in Cloud-Native Data Lake

This article introduces the exploration and practice of Mobvista in the field of cloud-native data lakes, as well as the architecture of StarLake.

Alibaba EMR Posted blog

An Overview of Alibaba Cloud's Comprehensive Cloud-Native Data Lake System

This article introduces the establishment of a cloud-native data lake system based on Alibaba Cloud OSS, Data Lake Formation (DLF), and various computing engines present in Alibaba Cloud.

Alibaba EMR Posted blog

How Delta Lake and DLF Service Facilitate Real-time CDC Synchronization in a Data Lake

This article explains how to perform real-time CDC synchronization in a data lake using Alibaba Cloud's Data Lake Formation (DLF) service.

Alibaba EMR Posted blog

How to Use JindoDistCp for Offline Data Migration to a Data Lake

This article discusses the data lake offline data migration process using JindoDistCp and explains how it improves the migration performance in different scenarios.

Alibaba EMR Posted blog

JindoTable for Data Optimization and Query Acceleration in a Data Lake

The article briefly discusses Alibaba Cloud's JindoTable and explains how it solves the data management problems in a data lake.

Alibaba EMR Posted blog

Cloud-Native Compute Engine: Challenges and Solutions

This article explains some of the challenges in cloud-native compute engines, and discusses some solutions and future directions.

Alibaba EMR Posted blog

Data Lake: How to Explore the Value of Data Using Multi-engine Integration

This article briefly discusses the metadata service and multi-engine support capabilities of the Alibaba Cloud Data Lake Formation (DLF) service.

Alibaba EMR Posted blog

EMR Remote Shuffle Service: A Powerful Elastic Tool of Serverless Spark

This article discusses Alibaba Cloud's EMR Remote Shuffle Service and explains how it solves the shuffle stability problems in compute-storage separation architectures.

Alibaba EMR Posted blog

Implementation and Challenges of Data Lake Metadata Services

This article explains the benefits, architecture, and implementation challenges of data lake metadata services.

Alibaba EMR Posted blog

EB-level Data Lake Based on OSS

This article briefly discusses data lake systems, their features, and describes the process of building a data lake storage based on Alibaba Cloud OSS.

Alibaba EMR Posted blog

Efficient Data Lake Formation Based on JindoFS and OSS

This article explains the process of data lake formation based on Alibaba Cloud OSS and JindoFS big data cache acceleration service.

Alibaba EMR Posted blog

All-in-one Lake Migration of Multiple Data Sources

This article briefly discusses Alibaba Cloud Data Lake Formation (DLF) service and explains how it solves the data migration challenges during lake migration of data from heterogeneous data sources.

Alibaba EMR Posted blog

JindoFS Cache-based Acceleration for Machine Learning Training in a Data Lake

The article explains how JindoFS cache-based acceleration service improves machine learning training speed in a data lake.

Alibaba EMR Posted blog

DataWorks: A Platform for Developing and Governing a Data Lake

This article briefly discusses Alibaba Cloud's big data platform, DataWorks, and explains how it solves the common challenges of a data lake.

Alibaba EMR Posted blog

JindoFS: Computing and Storage Separation for Cloud-native Big Data

In this blog, we'll introduce the origins of JindoFS and discuss the problems its

Alibaba EMR Posted blog

In-depth Review of Apache Spark: Spark + AI Summit 2020

Matei Zaharia, founder of the Spark project, gave an in-depth review of Spark at the Spark + AI Summit 2020 in conjunction with its 10-year anniversary.

Alibaba EMR Posted blog

EMR: An Efficient Cloud-native Data Analytics Engine

This blog explores the architecture and design goals of Alibaba Cloud E-MapReduce (EMR), as well as introduces two key components of EMR: JindoFS and .

Alibaba EMR Posted blog

Empowering Open-source Cloud Ecosystems: Development of Alibaba Cloud's Open-source Big Data Platform

This article discusses how Alibaba Cloud EMR empowers open-source cloud ecosystems from multiple perspectives.

Alibaba EMR Posted blog

Storage Policies and Read/Write Optimization in JindoFS

This article describes common problems and optimization methods of data read/write in computing-storage separation scenarios, and introduces data cache acceleration with JindoFS.

Alibaba EMR Posted blog

Data Lake Acceleration in Data Lake Architecture

This article introduces the reasons for choosing data lake acceleration, and shares Alibaba Cloud's practical experience and technical solutions.

Latest Comments