×
flink

Improving speed and stability of checkpointing with generic log-based incremental checkpoints

In this article, we discuss several ways to improve the speed and stability of checkpointing with generic log-based incremental checkpoints.

Adaptive Batch Scheduler Automatically Decide Parallelism of Flink Batch Jobs

We introduce Apache Flink's adaptive batch scheduler and detail how it can automatically decide parallelism of Flink batch jobs.

Packaging Issues in Datastream Development

This article mainly explains which dependencies need to be introduced and which need to be packaged into the job JAR during the job development.

More Than Computing: A New Era Led by the Warehouse Architecture of Apache Flink

Mowen discusses the future of Apache Flink regarding its core capabilities of stream computing and improving the processing standards of the entire industry.

Application of Alink and Tensorflow on Flink in JD

This article is compiled from the presentation of JD search and recommendation algorithm engineers Zhang Ying and Liu Lu at Flink Forward Asia 2021.

How to Build a Cloud-Native Open-Source Big Data Platform | Best Practices of InMobi

This article shares the best practices of InMobi based on the open-source big data service of Alibaba Cloud.

Flink Remote Shuffle Open-Source: Shuffle Service for Cloud-Native and Unified Batch and Stream Processing

This article introduces the research and development background and the design and use of Flink Remote Shuffle.

Streaming ETL for MySQL and Postgres with Flink CDC

This tutorial explains how to quickly build streaming ETL for MySQL and Postgres with Flink CDC.

The Open-Source Real-Time Data Warehouse Solution Based on EMR OLAP - ClickHouse Transaction Implementation

This article describes the solution of an open-source real-time data warehouse based on EMR OLAP.

How We Improved Scheduler Performance for Large-Scale Jobs

This article discusses scheduler performance improvements for large-scale jobs in Flink 1.13 and 1.14.

Flink Practices in iQiyi's Advertising Business

This article explains thoroughly how iQiyi (a Chinese online video platform) utilizes Apache Flink.

Sort-Based Blocking Shuffle Implementation in Flink – Part 2

Part 2 of this 2-part series will give you insight into some core design considerations and implementation details of the sort-based blocking shuffle in Flink.

Sort-Based Blocking Shuffle Implementation in Flink – Part 1

Part 1 of this 2-part series will introduce the sort-based blocking shuffle, present benchmark results, and provide guidelines on how to use this new feature.

A Few Tips on Large-Scale Real-Time Data Warehouse Construction

This article offers helpful tips for large-scale real-time data warehouse construction.

Crowd Selection and Data Service Practices Based on MaxCompute & Hologres

This article describes how to use MaxCompute to add tags to a large number of people and carry out analysis and modeling through Hologres.

The Practice of Real-Time Data Processing Based on MaxCompute

This article explains how to write real-time streaming data based on BinLog, Flink, and Spark Streaming into MaxCompute.

Kwai Builds Real-Time Data Warehouse Scenario-Based Practice on Flink

This article introduces the real-time data warehouse architecture built by Kwai based on Flink and offers solutions to some difficult problems.

Jingdong: Flink SQL Optimization Practice

This article focuses on the optimization measures of Jingdong in Flink SQL tasks, focusing on the aspects of shuffle, join mode selection, object reuse, and UDF reuse.

Best Practices for Flink on Zeppelin Stream Computing Processing

This article is an overview of the best practices for Flink on Zeppelin stream computing processing taken from a recent lecture.

Zeppelin Notebook: An Important Tool for PyFlink Development Environment

This article introduces a PyFlink development environment tool that can help users solve various problems.