×
Hadoop

Koordinator: Supporting Hybrid Deployment of Kubernetes and YARN

This article introduces Koordinator’s support for Hybrid Development of Kubernetes and YARN and Xiaohongshu’s Practical Experience Sharing of the Hybrid Development.

Running Hadoop YARN with K8s by Koordinator

This article introduces Koordinator's support for running Hadoop YARN jobs by utilizing koord-batch resources alongside other Kubernetes pods.

Building a Hadoop Environment Based on ECS Instances

This article describes how to build a Hadoop pseudo-distributed environment on an Elastic Compute Service (ECS) instance that runs a Linux operating system.

Alibaba Cloud Cloud-Native Integrated Data Warehouse – An Interpretation of the New Capabilities of Lakehouse

This article discusses the overall updates to Lakehouse architecture.

Practices of Simulating IDC Spark Read and Write MaxCompute

This article uses EMR (Cloud Hadoop) to simulate a local Hadoop cluster accessing MaxCompute data.

Big Data Q&A - Friday Blog, Week 65

Friday Q&A is back! Let's take a look at some of the many very interesting questions I was asked during Alibaba Cloud training sessions this week!

Packaging Issues in Datastream Development

This article mainly explains which dependencies need to be introduced and which need to be packaged into the job JAR during the job development.

Zero-Day Attack Analysis and Dissemination Method Disclosure for Hadoop Yarn RPC

This article explains the vulnerability in Hadoop Yarn RPC and possible solutions.

The Big Data Platform Behind Alibaba's E-Commerce Systems

This article looks at the big data platform that helped power last year's Double 11.

Zeppelin Notebook: An Important Tool for PyFlink Development Environment

This article introduces a PyFlink development environment tool that can help users solve various problems.

Deploy and Run Azkaban on Alibaba Cloud

This article is a tutorial on how to run the open-source project Azkaban on Alibaba Cloud with ApsaraDB (Alibaba Cloud Database).

How Can We Defend against Multiple Intrusion Methods on Multiple Platforms When Lemon-Duck Is Continuously Active?

This article offers some insight into protection against botnets and other Internet threats.

An Overview of Alibaba Cloud's Comprehensive Cloud-Native Data Lake System

This article introduces the establishment of a cloud-native data lake system based on Alibaba Cloud OSS, Data Lake Formation (DLF), and various computing engines present in Alibaba Cloud.

How to Use JindoDistCp for Offline Data Migration to a Data Lake

This article discusses the data lake offline data migration process using JindoDistCp and explains how it improves the migration performance in different scenarios.

JindoTable for Data Optimization and Query Acceleration in a Data Lake

The article briefly discusses Alibaba Cloud's JindoTable and explains how it solves the data management problems in a data lake.

EB-level Data Lake Based on OSS

This article briefly discusses data lake systems, their features, and describes the process of building a data lake storage based on Alibaba Cloud OSS.

Efficient Data Lake Formation Based on JindoFS and OSS

This article explains the process of data lake formation based on Alibaba Cloud OSS and JindoFS big data cache acceleration service.

The Discovery of a Promising Technology

In this article, Zhang Jianfeng, a veteran in the open-source community, explains how to evaluate whether the technology is worth learning using three key dimensions.

Alluxio Deep Learning Practices - 1: Running PyTorch Framework on HDFS

This article demonstrates how Alluxio simplifies running the PyTorch framework on HDFS using the Kubernetes platform to drastically improve development efficiency.

How to Migrate Data From Hadoop to The Cloud?

This blog gives you a big dive into secure migrating data from Apache Hadoop to the cloud platform.