×

Alibaba EMR

629 Reputation

EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm.

Follow
Activities(10) Posts(10) Series(1) Areas of Expertise Following Followers

Alibaba EMR Posted blog

Introduction to EMR DataScience

In this article, AI expert Aohai provides an overview of the DataScience node of E-MapReduce and its components.

Alibaba EMR Posted blog

The Secrets Behind the Optimized SQL Performance of EMR Spark

Lin Xuewei, a technical expert, gives an overview of the latest performance and efficiency optimizations that were made to TPC-DS Perf after its third submission.

Alibaba EMR Posted blog

Spark-TFRecord: Toward Full Support of TFRecord in Spark

In this post, we will introduce Spark-TFRecord, a new solution to enable support for native TensorFlow data format in Spark.

Alibaba EMR Posted blog

Alibaba Cloud E-MapReduce Sets World Record Again on TPC-DS Benchmark

This year, EMR increased its computing speed to 2.2 times of that from last year, breaking the world record again in the big data sector.

Alibaba EMR Posted blog

Introducing JindoFS: A High-performance Data Lake Storage Solution

JindoFS is a cloud-native file system that integrates the advantages of local disks and the ultra-large capacity of Object Storage Service (OSS).

Alibaba EMR Posted blog

Using Data Preorganization for Faster Queries in Spark on EMR

This article looks into how you can accelerate query speeds by using the Spark Relational Cache of Alibaba Cloud E-MapReduce.

Alibaba EMR Posted blog

My Thoughts on Distributed Computing Frameworks

This article provides a fully verified solution (with code) to run LR and GBDT on a LibSVM-formatted dataset efficiently using TensorFlow.

Alibaba EMR Posted blog

Use EMR Spark Relational Cache to Synchronize Data Across Clusters

This article looks at EMR Spark Relational Cache, how it can be useful in a number of scenarios, and how use it to synchronize Data Across two clusters.

Alibaba EMR Posted blog

Use Relational Cache to Accelerate EMR Spark in Data Analysis

This article looks into what cache and relational cache is and how you can use it to accelerate EMR spark in data analysis operations.

Alibaba EMR Posted blog

Use Apache Arrow to Assist PySpark in Data Processing

This article looks at Apache Arrow and its usage in Spark and how you can use Apache Arrow to assist PySpark in data processing operations.

Following (0)

See All

Followers (1)

See All
Latest Comments