×
spark

Integration of Paimon and Spark - Part 2: Query Optimization

This article introduces the integration of Paimon and Spark, specifically focusing on query optimization.

Integration of Paimon and Spark - Part I

This article introduces the main features in the new version of Paimon that are supported by the Spark-based computing engine.

miHoYo Big Data Cloud-Native Practices

The article introduces the process of upgrading MiHoYo's big data architecture to cloud-native and the benefits of using Spark on K8s.

Running ODPS PySpark using CLI

In this article we will discuss about Spark in general, its uses in the Big Data workflow and how to configure and run Spark in the CLI mode for CI/CD purposes.

The Spark on ACK Practice of Hago

This article introduces Hago's practice of adopting Spark on ACK and its migration process.

How to Run Spark in MaxCompute

This article describes how to configure Spark 2.x dependencies and provides some examples.

Learning about Distributed Systems – Part 16: Solve the Performance Problem of Worker

Part 16 of this series discusses problems with slaves' performance and MapReduce and whether there is room for improvement.

Practices of Simulating IDC Spark Read and Write MaxCompute

This article uses EMR (Cloud Hadoop) to simulate a local Hadoop cluster accessing MaxCompute data.

Big Data Q&A - Friday Blog, Week 65

Friday Q&A is back! Let's take a look at some of the many very interesting questions I was asked during Alibaba Cloud training sessions this week!

The Spark and Delta Lake Engine Enterprise Edition of Databricks Helps Efficiently Access Lake Houses

This article describes how to optimize the performance of the product features provided by the Enterprise Edition to help you efficiently access lake houses.

Zuoyebang's Best Practices for Building Data Lakes Based on Delta Lake

This article aims to solve the performance problems of offline data warehouses (daily and hourly) during production and usage.

Best Practices for Big Data Processing in Spark

This article is an overview of the best practices for big data processing in Spark taken from a lecture.

DLF + DDI Best Practices for One-Stop Data Lake Formation and Analysis

This article aims to give readers a deeper understanding of Alibaba Cloud Data Lake Formation (DLF) and Databricks DataInsight (DDI).

Use Flink Hudi to Build a Streaming Data Lake

This article introduces the optimization and evolution of Flink Hudi's original mini-batch-based incremental computing model through stream computing.

Alibaba Big Data Practices on Cloud-Native – EMR Spark on ACK

This article discusses the practices and challenges of EMR Spark on Alibaba Cloud Kubernetes.

Application of Delta Lake in Soul

This article explains the background of Delta Lake along with practices, problems, and solutions.

Fluid Helps Improve Data Elasticity with Customized Auto Scaling

This article gives step-by-step instructions about auto scaling with Fluid.

Flink + Iceberg: How to Construct a Whole-scenario Real-time Data Warehouse

In this article, the author explains building a real-time data warehouse using Apache Flink and Apache Iceberg.

Apache Iceberg 0.11.0: Features and Deep Integration with Flink

In this article, the author discusses how Apache Flink and Apache Iceberg have opened a new chapter in building a data lake architecture featuring stream-batch unification.

Integrating Apache Hudi and Apache Flink for New Data Lake Solutions

This article explains Apache Hudi and Apache Flink and the benefits of implementation.