EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm.
FollowThis article describes how to use the spark-submit command line interface (CLI) to submit a Spark job after EMR Serverless Spark is connected to ECS.
This article introduces a data processing workflow that integrates Realtime Compute for Apache Flink, EMR Serverless Spark, and Apache Paimon to enable real-time data ingestion.
This artile introduces the usability and maintainability of EMR Serverless Spark in stream processing.
This article is compiled from the first session of the EMR StarRocks online open class - EMR Serverless StarRocks3.
This article introduces the integration of Paimon and Spark, specifically focusing on query optimization.
This article introduces the main features in the new version of Paimon that are supported by the Spark-based computing engine.
This article was compiled from a speech from Qingwei Yang at the Alibaba Cloud Data Lake Technology Special Exchange Meeting on July 17, 2022.
This article was compiled from a speech from Xiong Jiashu at the Alibaba Cloud Data Lake Technology Special Exchange Meeting.
This article discusses real-time data warehouse construction and offers examples of using Flink CDC and StarRocks for real-time links and data updates.
This article describes how to use Databricks and MLflow to build a machine learning lifecycle management platform.
This part of the Databricks Data Insight Open Course article series introduces Delta Lake Basics (Open-Source Edition).
This part of the Databricks Data Insight Open Course article series introduces Delta Lake Basics (Commercial Edition).
This article discusses using Delta Lake to build a batch-stream unified data warehouse and putting it into practice.
This part of the Databricks Data Insight Open Course article series discusses the evolution history of Delta Lake and its current situation.
This article explores Delta Lake and discusses the implementation of two solutions related to traditional data warehouses based on Hive tables.
This article introduces the latest two important features of RSS: support for Adaptive Query Execution (AQE) and throttling.
This article focuses on the technology, performance, and future planning of StarRocks' blazing-fast data lake analytics.
This article describes how to optimize the performance of the product features provided by the Enterprise Edition to help you efficiently access lake houses.
This article aims to solve the performance problems of offline data warehouses (daily and hourly) during production and usage.
This article reveals the key technologies of the data lake analytics engine in detail and uses StarRocks to help users understand the architecture of the system.
Following (0)
See All