×
Spark SQL

Integration of Paimon and Spark - Part I

This article introduces the main features in the new version of Paimon that are supported by the Spark-based computing engine.

Learning about Distributed Systems - Part 20: Rule-Based Optimization (RBO)

Part 20 of this series discusses another important SQL optimization method: rule-based optimization (RBO).

Learning about Distributed Systems – Part 19: Performance-Impacting Operations in SQL

Part 19 of this series discusses SQL performance optimization.

Best Practices for Big Data Processing in Spark

This article is an overview of the best practices for big data processing in Spark taken from a lecture.

In-depth Review of Apache Spark: Spark + AI Summit 2020

Matei Zaharia, founder of the Spark project, gave an in-depth review of Spark at the Spark + AI Summit 2020 in conjunction with its 10-year anniversary.

Rewriting the Execution Plan in the EMR Spark Relational Cache

This article goes through the process of rewriting execution plans in the Spark Relational Cache on EMR.