×
E-MapReduce

The Secrets Behind the Optimized SQL Performance of EMR Spark

Lin Xuewei, a technical expert, gives an overview of the latest performance and efficiency optimizations that were made to TPC-DS Perf after its third submission.

Spark-TFRecord: Toward Full Support of TFRecord in Spark

In this post, we will introduce Spark-TFRecord, a new solution to enable support for native TensorFlow data format in Spark.

Alibaba Cloud E-MapReduce Sets World Record Again on TPC-DS Benchmark

This year, EMR increased its computing speed to 2.2 times of that from last year, breaking the world record again in the big data sector.

Ideas and Methods for System Refactoring

This articles looks at some of the misunderstandings and frequently overlooked aspects of refactoring, proposing some best practices.

Reshaping the Java Language on the Cloud

Learn how Alibaba is transforming the Java language.

50 Efficient Code Samples for Java Programming

This article is a list of 50 efficient Java code samples.

There's No Need for Hadoop: Analyze Server Logs with AnalyticDB

This article outlines how you can use Alibaba Cloud AnalyticDB to analyze server logs without needing to set up Hadoop.

Set up a Hadoop Cluster with Apache Ambari

In this tutorial, you will learn how to set up Hadoop and its components on a multinode cluster using Apache Ambari.

E-MapReduce Best Practices

This article is based on Alibaba Cloud E-MapReduce and the entire Alibaba Cloud system. We will focus on the most important scenarios, such as live video, video stream, etc.

Using Data Preorganization for Faster Queries in Spark on EMR

This article looks into how you can accelerate query speeds by using the Spark Relational Cache of Alibaba Cloud E-MapReduce.

Learning Kafka from Scratch: A Guide to Kafka (Continued)

In this article, part two of two parts, an Alibaba engineer shares everything he knows about Kafka.

Learning Kafka from Scratch: A Guide to Kafka

In this article, part one of two parts, an Alibaba engineer shares everything he knows about Kafka.

Introducing JindoFS: A High-performance Data Lake Storage Solution

JindoFS is a cloud-native file system that integrates the advantages of local disks and the ultra-large capacity of Object Storage Service (OSS).

The Evolution of Large-Scale Co-Location Technology at Alibaba

This article looks at how co-location technology has been explored and developed at Alibaba into what is now a large-scale solution architecture.

The Now and Future of Financial Data Intelligence at Ant Financial

This article outlines Ant Financial's financial data intelligence system, which is built on next-generation technology to address increasingly complex use cases.

MaxCompute Tunnel Offline Batch Data Channel FAQs

Alibaba Cloud MaxCompute provides Tunnel commands for uploading and downloading of large batches of offline data.

The Secret behind Youku's Success with Big Data

In this article, Men Deliang of Youku shares the success of Youku's business and platform by migrating from Hadoop to Alibaba Cloud MaxCompute.

Processing Cartesian Products with PyODPS DataFrame

This article mainly introduces how you can use PyODPS to perform Cartesian product operations throught DataFrame APIs.

Best Practices for Migrating Data from Kafka to MaxCompute

In this article, we will show you how to use Alibaba Cloud E-MapReduce (EMR) to build a Kafka cluster automatically.

Accessing Presto through Gateway

This article describes how to use an HAProxy reverse proxy to access Presto service through a Gateway node.