Community

Blog Events Webinars Tutorials Forum

Create Account

×

Hadoop

Synchronize Data from Hadoop to Alibaba Cloud Elasticsearch Using DataWorks

This guide delineates the process of utilizing the Data Integration service of DataWorks to seamlessly synchronize data from Hadoop to Alibaba Cloud E...

Data Geek May 11, 2024 1,441

Koordinator: Supporting Hybrid Deployment of Kubernetes and YARN

This article introduces Koordinator’s support for Hybrid Development of Kubernetes and YARN and Xiaohongshu’s Practical Experience Sharing of the Hybrid Development.

Alibaba Cloud Native Community January 25, 2024 4,104

Running Hadoop YARN with K8s by Koordinator

This article introduces Koordinator's support for running Hadoop YARN jobs by utilizing koord-batch resources alongside other Kubernetes pods.

Alibaba Cloud Native Community December 7, 2023 4,597

Building a Hadoop Environment Based on ECS Instances

This article describes how to build a Hadoop pseudo-distributed environment on an Elastic Compute Service (ECS) instance that runs a Linux operating system.

Alibaba Cloud Community December 4, 2023 3,130

Alibaba Cloud Cloud-Native Integrated Data Warehouse – An Interpretation of the New Capabilities of Lakehouse

This article discusses the overall updates to Lakehouse architecture.

Alibaba Cloud MaxCompute September 30, 2022 3,907

Practices of Simulating IDC Spark Read and Write MaxCompute

This article uses EMR (Cloud Hadoop) to simulate a local Hadoop cluster accessing MaxCompute data.

Alibaba Cloud MaxCompute August 15, 2022 2,761

Big Data Q&A - Friday Blog, Week 65

Friday Q&A is back! Let's take a look at some of the many very interesting questions I was asked during Alibaba Cloud training sessions this week!

JDP June 17, 2022 3,142

Packaging Issues in Datastream Development

This article mainly explains which dependencies need to be introduced and which need to be packaged into the job JAR during the job development.

Apache Flink Community April 19, 2022 5,976

Zero-Day Attack Analysis and Dissemination Method Disclosure for Hadoop Yarn RPC

This article explains the vulnerability in Hadoop Yarn RPC and possible solutions.

Alibaba Cloud Community November 16, 2021 8,734

The Big Data Platform Behind Alibaba's E-Commerce Systems

This article looks at the big data platform that helped power last year's Double 11.

Alibaba Cloud MaxCompute March 3, 2020 22,755

Zeppelin Notebook: An Important Tool for PyFlink Development Environment

This article introduces a PyFlink development environment tool that can help users solve various problems.

Apache Flink Community September 29, 2021 4,633

Deploy and Run Azkaban on Alibaba Cloud

This article is a tutorial on how to run the open-source project Azkaban on Alibaba Cloud with ApsaraDB (Alibaba Cloud Database).

ApsaraDB September 27, 2021 4,909

How Can We Defend against Multiple Intrusion Methods on Multiple Platforms When Lemon-Duck Is Continuously Active?

This article offers some insight into protection against botnets and other Internet threats.

Alibaba Cloud Community September 16, 2021 5,030

An Overview of Alibaba Cloud's Comprehensive Cloud-Native Data Lake System

This article introduces the establishment of a cloud-native data lake system based on Alibaba Cloud OSS, Data Lake Formation (DLF), and various computing engines present in Alibaba Cloud.

Alibaba EMR June 8, 2021 7,627

How to Use JindoDistCp for Offline Data Migration to a Data Lake

This article discusses the data lake offline data migration process using JindoDistCp and explains how it improves the migration performance in different scenarios.

Alibaba EMR May 26, 2021 8,152

JindoTable for Data Optimization and Query Acceleration in a Data Lake

The article briefly discusses Alibaba Cloud's JindoTable and explains how it solves the data management problems in a data lake.

Alibaba EMR May 14, 2021 6,611

EB-level Data Lake Based on OSS

This article briefly discusses data lake systems, their features, and describes the process of building a data lake storage based on Alibaba Cloud OSS.

Alibaba EMR May 6, 2021 7,398

Efficient Data Lake Formation Based on JindoFS and OSS

This article explains the process of data lake formation based on Alibaba Cloud OSS and JindoFS big data cache acceleration service.

Alibaba EMR April 30, 2021 5,349

The Discovery of a Promising Technology

In this article, Zhang Jianfeng, a veteran in the open-source community, explains how to evaluate whether the technology is worth learning using three key dimensions.

Apache Flink Community November 6, 2020 3,881

Alluxio Deep Learning Practices - 1: Running PyTorch Framework on HDFS

This article demonstrates how Alluxio simplifies running the PyTorch framework on HDFS using the Kubernetes platform to drastically improve development efficiency.

Alibaba Container Service August 25, 2020 9,436

Related Tags

artificial intelligence big data cloud computing