×
HDFS

Apache Paimon: Streaming Lakehouse is Coming

This article is based on the keynote speeches given by LI Jinsong, WU Xiangping, DI Xingxing, and WANG Yunpeng during Flink Forward Asia 2023.

Cloud-Native AI: Fluid + JindoFS Helps Improve the Speed of Model Training for Massive Small Files from Weibo by 18 Times

This article introduces a new architecture solution based on Fluid (containing JindoRuntime) designed and implemented by Weibo's technical teams.

How to Build a Blazing-Fast Data Lake Analytics Engine

This article reveals the key technologies of the data lake analytics engine in detail and uses StarRocks to help users understand the architecture of the system.

What Is HDFS? Its Architecture, Application Scenarios, Advantages and Disadvantages

Hadoop Distributed File System (HDFS) refers to a distributed file system designed to run on commodity hardware.

Alibaba Cloud JindoFS Handles Stress Testing Easily with More Than One Billion Files

This article reviews JindoFS stress testing, featuring multiple scenarios and graphs.

DataWorks: A Platform for Developing and Governing a Data Lake

This article briefly discusses Alibaba Cloud's big data platform, DataWorks, and explains how it solves the common challenges of a data lake.

Alluxio Deep Learning Practices - 1: Running PyTorch Framework on HDFS

This article demonstrates how Alluxio simplifies running the PyTorch framework on HDFS using the Kubernetes platform to drastically improve development efficiency.

Big Data Storage and Spark on Kubernetes

This article discusses big data storage and how Alibaba Cloud container services and Spark on Kubernetes can be used to meet several different storage scenarios.