EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm.Follow
This article discusses the practices and challenges of EMR Spark on Alibaba Cloud Kubernetes.
This article explains the background of Delta Lake along with practices, problems, and solutions.
This article reviews JindoFS stress testing, featuring multiple scenarios and graphs.
This article introduces the best practices and cases for building, analyzing, developing, and governing cloud-native data lakes.
This article introduces the exploration and practice of Mobvista in the field of cloud-native data lakes, as well as the architecture of StarLake.
This article introduces the establishment of a cloud-native data lake system based on Alibaba Cloud OSS, Data Lake Formation (DLF), and various computing engines present in Alibaba Cloud.
This article explains how to perform real-time CDC synchronization in a data lake using Alibaba Cloud's Data Lake Formation (DLF) service.
This article discusses the data lake offline data migration process using JindoDistCp and explains how it improves the migration performance in different scenarios.
The article briefly discusses Alibaba Cloud's JindoTable and explains how it solves the data management problems in a data lake.
This article explains some of the challenges in cloud-native compute engines, and discusses some solutions and future directions.
This article briefly discusses the metadata service and multi-engine support capabilities of the Alibaba Cloud Data Lake Formation (DLF) service.
This article discusses Alibaba Cloud's EMR Remote Shuffle Service and explains how it solves the shuffle stability problems in compute-storage separation architectures.
This article explains the benefits, architecture, and implementation challenges of data lake metadata services.
This article briefly discusses data lake systems, their features, and describes the process of building a data lake storage based on Alibaba Cloud OSS.
This article explains the process of data lake formation based on Alibaba Cloud OSS and JindoFS big data cache acceleration service.
This article briefly discusses Alibaba Cloud Data Lake Formation (DLF) service and explains how it solves the data migration challenges during lake migration of data from heterogeneous data sources.
The article explains how JindoFS cache-based acceleration service improves machine learning training speed in a data lake.
This article briefly discusses Alibaba Cloud's big data platform, DataWorks, and explains how it solves the common challenges of a data lake.
In this blog, we'll introduce the origins of JindoFS and discuss the problems its
Matei Zaharia, founder of the Spark project, gave an in-depth review of Spark at the Spark + AI Summit 2020 in conjunction with its 10-year anniversary.