Set up a Hadoop Cluster with Apache Ambari

In this tutorial, you will learn how to set up Hadoop and its components on a multinode cluster using Apache Ambari.

E-MapReduce Best Practices

This article is based on Alibaba Cloud E-MapReduce and the entire Alibaba Cloud system. We will focus on the most important scenarios, such as live video, video stream, etc.

Using Data Preorganization for Faster Queries in Spark on EMR

This article looks into how you can accelerate query speeds by using the Spark Relational Cache of Alibaba Cloud E-MapReduce.

Learning Kafka from Scratch: A Guide to Kafka (Continued)

In this article, part two of two parts, an Alibaba engineer shares everything he knows about Kafka.

Learning Kafka from Scratch: A Guide to Kafka

In this article, part one of two parts, an Alibaba engineer shares everything he knows about Kafka.

Introducing JindoFS: A High-performance Data Lake Storage Solution

JindoFS is a cloud-native file system that integrates the advantages of local disks and the ultra-large capacity of Object Storage Service (OSS).

The Evolution of Large-Scale Co-Location Technology at Alibaba

This article looks at how co-location technology has been explored and developed at Alibaba into what is now a large-scale solution architecture.

The Now and Future of Financial Data Intelligence at Ant Financial

This article outlines Ant Financial's financial data intelligence system, which is built on next-generation technology to address increasingly complex use cases.

MaxCompute Tunnel Offline Batch Data Channel FAQs

Alibaba Cloud MaxCompute provides Tunnel commands for uploading and downloading of large batches of offline data.

The Secret behind Youku's Success with Big Data

In this article, Men Deliang of Youku shares the success of Youku's business and platform by migrating from Hadoop to Alibaba Cloud MaxCompute.

Processing Cartesian Products with PyODPS DataFrame

This article mainly introduces how you can use PyODPS to perform Cartesian product operations throught DataFrame APIs.

Best Practices for Migrating Data from Kafka to MaxCompute

In this article, we will show you how to use Alibaba Cloud E-MapReduce (EMR) to build a Kafka cluster automatically.

Accessing Presto through Gateway

This article describes how to use an HAProxy reverse proxy to access Presto service through a Gateway node.

Drilling into Big Data – Data Querying and Analysis (6)

In this article, we will walk you through the basics of Hive, including table creation and other underlying concepts for big data applications.

Drilling into Big Data – Data Ingestion (4)

In this article, we will take a closer look into the concepts and usage of HDFS and Sqoop for data ingestion.

Drilling into Big Data-Data preparation (5)

In this article, we will discuss about Spark for big data and show you how to set it up on Alibaba Cloud.

Drilling into Big Data – Data Interpretation (3)

In this article, we will talk about data sources and various data formats to ingest the data into our big data environment.

Drilling into Big Data – A Gold Mine of Information (1)

This blog series is aimed at showing you how to make effective use of Big Data and Business Intelligence to decipher insights quickly from raw enterprise data.

Drilling into Big Data – Getting started with OSS and EMR (2)

In this article, we will show you how to build a big data environment on Alibaba Cloud with Object Storage Service and E-MapReduce.

Drowning in Big Data? Start Getting Real Value Before It’s Too Late

Data is everywhere. Phenomena such as the Internet of Things (IoT) and widespread digitization have unleashed a tsunami of information on the world and enterprises are struggling to keep up.