This article is part of a series focusing on interview questions for technicians, with a specific emphasis on the distributed system.
This article explains the principles and best practices for distributed locks.
Part 27 of this series discusses distributed systems in terms of throughput and latency.
Part 26 of this series introduces HBase and explains how it applies to random or range queries of massive data and how it can maintain multiple versions of data.
Part 25 of this series introduces Apache Kylin and its concept of “space for time.”
Part 24 of this series introduces massive parallel processing (MPP) and how it relates to the exploration of system extensibility
Part 23 of this series explains why Offline data warehouses based on Hive and real-time data warehouses based on Kafka + Flink make it easy to distribute data warehouses.
Part 22 of this series discusses whether there is a more flexible method to optimize SQL query performance than CBO.
Part 21 of this series focuses on why there is a CBO and how it is implemented.
Part 20 of this series discusses another important SQL optimization method: rule-based optimization (RBO).
Part 19 of this series discusses SQL performance optimization.
Part 18 of this series explains how to improve application development efficiency on distributed systems.
Part 17 of this series introduces several possible Shuffle methods and their adoption in MapReduce and Spark.
Part 16 of this series discusses problems with slaves' performance and MapReduce and whether there is room for improvement.
As the data grows rapidly and exponentially, cloud servers often run out of space to store them. Luckily, with distributed file systems like HDFS, we are now cracking the problem of low memory.
Inconsistency is so protruding, and we have tried every means to solve it. We want high availability under scalability.
Part 15 of this series shows that distributed systems are not completely distributed, typical solutions to centralization problems, and the performance problems of masters.
This is the second blog of the distributed systems series. Today we look at the intriguing history of how academia and industry, open-source and business get along with each other.
Last time we talked about WHERE to store massive data, and this time, HOW. Massive data brings massive costs.
Part 10 of this series introduces several implementations of distributed transactions as a second preventive solution to data inconsistency.