Part 24 of this series introduces massive parallel processing (MPP) and how it relates to the exploration of system extensibility
Part 23 of this series explains why Offline data warehouses based on Hive and real-time data warehouses based on Kafka + Flink make it easy to distribute data warehouses.
Part 22 of this series discusses whether there is a more flexible method to optimize SQL query performance than CBO.
Part 21 of this series focuses on why there is a CBO and how it is implemented.
Part 20 of this series discusses another important SQL optimization method: rule-based optimization (RBO).
Part 19 of this series discusses SQL performance optimization.
Part 18 of this series explains how to improve application development efficiency on distributed systems.
Part 17 of this series introduces several possible Shuffle methods and their adoption in MapReduce and Spark.
Part 16 of this series discusses problems with slaves' performance and MapReduce and whether there is room for improvement.
As the data grows rapidly and exponentially, cloud servers often run out of space to store them. Luckily, with distributed file systems like HDFS, we are now cracking the problem of low memory.
Inconsistency is so protruding, and we have tried every means to solve it. We want high availability under scalability.
Part 15 of this series shows that distributed systems are not completely distributed, typical solutions to centralization problems, and the performance problems of masters.
This is the second blog of the distributed systems series. Today we look at the intriguing history of how academia and industry, open-source and business get along with each other.
Last time we talked about WHERE to store massive data, and this time, HOW. Massive data brings massive costs.
Part 10 of this series introduces several implementations of distributed transactions as a second preventive solution to data inconsistency.
Part 9 of this series introduces the replica mechanism for high availability and discusses data consistency.
Part 8 of this series discusses one of the core problems of distributed systems: availability.
Part 7 of this series discusses one of the core problems of distributed systems: scalability.
Today we will take a look at the distributed trasactions based on Dynamo and Base, and find out what the advantages and disvantages are.
While performance and availability are very important, slow systems and the ones with low availability are often unacceptable, the weak consistency is also very useful.