Community

Blog Events Webinars Tutorials Forum

Create Account

×

spark

Integration of Paimon and Spark - Part 2: Query Optimization

This article introduces the integration of Paimon and Spark, specifically focusing on query optimization.

Alibaba EMR April 25, 2024 2,950

Integration of Paimon and Spark - Part I

This article introduces the main features in the new version of Paimon that are supported by the Spark-based computing engine.

Alibaba EMR April 15, 2024 3,321

miHoYo Big Data Cloud-Native Practices

The article introduces the process of upgrading MiHoYo's big data architecture to cloud-native and the benefits of using Spark on K8s.

Alibaba Cloud Native March 5, 2024 2,676

Running ODPS PySpark using CLI

In this article we will discuss about Spark in general, its uses in the Big Data workflow and how to configure and run Spark in the CLI mode for CI/CD purposes.

Alibaba Cloud Indonesia February 19, 2024 2,527

The Spark on ACK Practice of Hago

This article introduces Hago's practice of adopting Spark on ACK and its migration process.

Alibaba Cloud Native January 25, 2024 2,136

How to Run Spark in MaxCompute

This article describes how to configure Spark 2.x dependencies and provides some examples.

Farruh January 12, 2024 2,713

Learning about Distributed Systems – Part 16: Solve the Performance Problem of Worker

Part 16 of this series discusses problems with slaves' performance and MapReduce and whether there is room for improvement.

Alibaba Cloud_Academy June 26, 2023 2,522

Practices of Simulating IDC Spark Read and Write MaxCompute

This article uses EMR (Cloud Hadoop) to simulate a local Hadoop cluster accessing MaxCompute data.

Alibaba Cloud MaxCompute August 15, 2022 1,890

Big Data Q&A - Friday Blog, Week 65

Friday Q&A is back! Let's take a look at some of the many very interesting questions I was asked during Alibaba Cloud training sessions this week!

JDP June 17, 2022 2,597

The Spark and Delta Lake Engine Enterprise Edition of Databricks Helps Efficiently Access Lake Houses

This article describes how to optimize the performance of the product features provided by the Enterprise Edition to help you efficiently access lake houses.

Alibaba EMR May 16, 2022 3,751

Zuoyebang's Best Practices for Building Data Lakes Based on Delta Lake

This article aims to solve the performance problems of offline data warehouses (daily and hourly) during production and usage.

Alibaba EMR May 13, 2022 3,362

Best Practices for Big Data Processing in Spark

This article is an overview of the best practices for big data processing in Spark taken from a lecture.

Alibaba EMR October 12, 2021 3,801

DLF + DDI Best Practices for One-Stop Data Lake Formation and Analysis

This article aims to give readers a deeper understanding of Alibaba Cloud Data Lake Formation (DLF) and Databricks DataInsight (DDI).

Alibaba EMR October 12, 2021 3,537

Use Flink Hudi to Build a Streaming Data Lake

This article introduces the optimization and evolution of Flink Hudi's original mini-batch-based incremental computing model through stream computing.

Apache Flink Community September 26, 2021 5,660

Alibaba Big Data Practices on Cloud-Native – EMR Spark on ACK

This article discusses the practices and challenges of EMR Spark on Alibaba Cloud Kubernetes.

Alibaba EMR August 24, 2021 2,491

Application of Delta Lake in Soul

This article explains the background of Delta Lake along with practices, problems, and solutions.

Alibaba EMR July 19, 2021 2,719

Fluid Helps Improve Data Elasticity with Customized Auto Scaling

This article gives step-by-step instructions about auto scaling with Fluid.

Alibaba Cloud Native Community June 21, 2021 5,248

Flink + Iceberg: How to Construct a Whole-scenario Real-time Data Warehouse

In this article, the author explains building a real-time data warehouse using Apache Flink and Apache Iceberg.

Apache Flink Community June 8, 2021 27,849

Apache Iceberg 0.11.0: Features and Deep Integration with Flink

In this article, the author discusses how Apache Flink and Apache Iceberg have opened a new chapter in building a data lake architecture featuring stream-batch unification.

Apache Flink Community June 8, 2021 6,347

Integrating Apache Hudi and Apache Flink for New Data Lake Solutions

This article explains Apache Hudi and Apache Flink and the benefits of implementation.

Apache Flink Community May 17, 2021 3,935

Related Tags

artificial intelligence big data cloud computing