This article introduces the integration of Paimon and Spark, specifically focusing on query optimization.
This article introduces the main features in the new version of Paimon that are supported by the Spark-based computing engine.
The article introduces the process of upgrading MiHoYo's big data architecture to cloud-native and the benefits of using Spark on K8s.
In this article we will discuss about Spark in general, its uses in the Big Data workflow and how to configure and run Spark in the CLI mode for CI/CD purposes.
This article introduces Hago's practice of adopting Spark on ACK and its migration process.
This article describes how to configure Spark 2.x dependencies and provides some examples.
Part 16 of this series discusses problems with slaves' performance and MapReduce and whether there is room for improvement.
This article uses EMR (Cloud Hadoop) to simulate a local Hadoop cluster accessing MaxCompute data.
Friday Q&A is back! Let's take a look at some of the many very interesting questions I was asked during Alibaba Cloud training sessions this week!
This article describes how to optimize the performance of the product features provided by the Enterprise Edition to help you efficiently access lake houses.
This article aims to solve the performance problems of offline data warehouses (daily and hourly) during production and usage.
This article is an overview of the best practices for big data processing in Spark taken from a lecture.
This article aims to give readers a deeper understanding of Alibaba Cloud Data Lake Formation (DLF) and Databricks DataInsight (DDI).
This article introduces the optimization and evolution of Flink Hudi's original mini-batch-based incremental computing model through stream computing.
This article discusses the practices and challenges of EMR Spark on Alibaba Cloud Kubernetes.
This article explains the background of Delta Lake along with practices, problems, and solutions.
This article gives step-by-step instructions about auto scaling with Fluid.
In this article, the author explains building a real-time data warehouse using Apache Flink and Apache Iceberg.
In this article, the author discusses how Apache Flink and Apache Iceberg have opened a new chapter in building a data lake architecture featuring stream-batch unification.
This article explains Apache Hudi and Apache Flink and the benefits of implementation.