EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm.Follow
In this blog, we'll introduce the origins of JindoFS and discuss the problems its
Matei Zaharia, founder of the Spark project, gave an in-depth review of Spark at the Spark + AI Summit 2020 in conjunction with its 10-year anniversary.
This blog explores the architecture and design goals of Alibaba Cloud E-MapReduce (EMR), as well as introduces two key components of EMR: JindoFS and .
This article discusses how Alibaba Cloud EMR empowers open-source cloud ecosystems from multiple perspectives.
This article describes common problems and optimization methods of data read/write in computing-storage separation scenarios, and introduces data cache acceleration with JindoFS.
This article introduces the reasons for choosing data lake acceleration, and shares Alibaba Cloud's practical experience and technical solutions.
This article is based on the enterprise data lake construction solution using E-MapReduce and customer best practices shared by Ziguan.
In this article, AI expert Aohai provides an overview of the DataScience node of E-MapReduce and its components.
Lin Xuewei, a technical expert, gives an overview of the latest performance and efficiency optimizations that were made to TPC-DS Perf after its third submission.
In this post, we will introduce Spark-TFRecord, a new solution to enable support for native TensorFlow data format in Spark.
This year, EMR increased its computing speed to 2.2 times of that from last year, breaking the world record again in the big data sector.
JindoFS is a cloud-native file system that integrates the advantages of local disks and the ultra-large capacity of Object Storage Service (OSS).
This article looks into how you can accelerate query speeds by using the Spark Relational Cache of Alibaba Cloud E-MapReduce.
This article provides a fully verified solution (with code) to run LR and GBDT on a LibSVM-formatted dataset efficiently using TensorFlow.
This article looks at EMR Spark Relational Cache, how it can be useful in a number of scenarios, and how use it to synchronize Data Across two clusters.
This article looks into what cache and relational cache is and how you can use it to accelerate EMR spark in data analysis operations.
This article looks at Apache Arrow and its usage in Spark and how you can use Apache Arrow to assist PySpark in data processing operations.