You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: Chapter2.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -154,7 +154,7 @@ The root problem lies in the certification database mechanism used by Group Repl
154
154
155
155
## 2.7 Modified Group Replication Outperforms Semisynchronous Replication
156
156
157
-
Group Replication has been extensively enhanced while addressing scalability problems in MySQL 8.0.32. To validate these improvements, simultaneous testing of semisynchronous replication and Group Replication with Paxos log persistence was conducted. The deployment setup included two-node configurations for both semisynchronous and Group Replication, hosted on the same machine with independent SSDs and NUMA binding to isolate each node. Specifically, the MySQL primary utilized NUMA nodes 0 to 2, while the MySQL secondary utilized NUMA node 3. All settings, except those directly related to semisynchronous or Group Replication configurations, remained identical.
157
+
Group Replication has been extensively enhanced while addressing scalability problems in MySQL 8.0.32. To validate these improvements, simultaneous testing of semisynchronous replication and Group Replication with Paxos log persistence was conducted. The deployment setup included two-node configurations for both semisynchronous and Group Replication, hosted on the same machine with independent NVMe SSDs and NUMA binding to isolate each node. Specifically, the MySQL primary utilized NUMA nodes 0 to 2, while the MySQL secondary utilized NUMA node 3. All settings, except those directly related to semisynchronous or Group Replication configurations, remained identical.
158
158
159
159
The following figure shows the throughput comparison of semisynchronous replication and Group Replication with Paxos log persistence under different concurrency levels.
Copy file name to clipboardExpand all lines: Chapter4_3.md
+1-1
Original file line number
Diff line number
Diff line change
@@ -207,7 +207,7 @@ Dynamic programming simplifies a complex problem by breaking it down into simple
207
207
208
208
In the context of execution plan optimization, MySQL 8.0 has explored using dynamic programming algorithms to determine the optimal join order. This approach can greatly improve the performance of complex joins, though it remains experimental in its current implementation.
209
209
210
-
It is important to note that, due to potentially inaccurate cost estimation, the join order determined by dynamic programming algorithms may not always be the true optimal solution. Dynamic programming algorithms often provide the best plan but can have high computational overhead and may suffer from large costs due to incorrect cost estimation [55]. For a deeper understanding of the complex mechanisms involved, readers can refer to the paper "Dynamic Programming Strikes Back".
210
+
It is important to note that, due to potentially inaccurate cost estimation, the join order determined by dynamic programming algorithms may not always be the true optimal solution. Dynamic programming algorithms often provide the best plan but can have high computational overhead and may suffer from large costs due to incorrect cost estimation [55]. For a deeper understanding of the complex mechanisms involved, readers can refer to the paper "Dynamic Programming Strikes Back"[35].
Copy file name to clipboardExpand all lines: Chapter4_5.md
+2-2
Original file line number
Diff line number
Diff line change
@@ -20,9 +20,9 @@ With the increase in CPU core count, the CPU overhead for MySQL parsing has beco
20
20
21
21
Profile-guided optimization (PGO) is a compiler technique that improves program performance by using profiling data from test runs of the instrumented program. Rather than relying on programmer-supplied frequency information, PGO leverages profile data to optimize the final generated code, focusing on frequently executed areas of the program. This method reduces reliance on heuristics and can improve performance, provided the profiling data accurately represents typical usage scenarios.
22
22
23
-
Extensive practice has shown that MySQL's large codebase is especially well-suited for PGO. However, the effectiveness of PGO can be influenced by I/O storage devices and network latency. On systems with slower I/O devices, like hard drives, I/O becomes the primary bottleneck, limiting PGO's performance gains due to Amdahl's Law. In contrast, on systems with faster I/O devices such as SSDs, PGO can lead to substantial performance improvements. Network latency also affects PGO effectiveness, with higher latency generally reducing the benefits.
23
+
Extensive practice has shown that MySQL's large codebase is especially well-suited for PGO. However, the effectiveness of PGO can be influenced by I/O storage devices and network latency. On systems with slower I/O devices, like hard drives, I/O becomes the primary bottleneck, limiting PGO's performance gains due to Amdahl's Law. In contrast, on systems with faster I/O devices such as NVMe SSDs, PGO can lead to substantial performance improvements. Network latency also affects PGO effectiveness, with higher latency generally reducing the benefits.
24
24
25
-
In summary, while MySQL 8.0's PGO capabilities can greatly improve computational performance, the actual improvement depends on the balance between computational and I/O bottlenecks in the server setup. The following figure demonstrates that with SSD hardware configuration and NUMA binding, PGO can significantly improve the performance of MySQL.
25
+
In summary, while MySQL 8.0's PGO capabilities can greatly improve computational performance, the actual improvement depends on the balance between computational and I/O bottlenecks in the server setup. The following figure demonstrates that with NVMe SSD hardware configuration and NUMA binding, PGO can significantly improve the performance of MySQL.
Copy file name to clipboardExpand all lines: Chapter4_6.md
+8-8
Original file line number
Diff line number
Diff line change
@@ -8,13 +8,13 @@ The following figure depicts a scenario where MySQL primary and MySQL secondary
8
8
9
9

10
10
11
-
Figure 4-23. Testing architecture for Group Replication with pure Paxos protocol
11
+
Figure 4-23. Testing architecture for Group Replication with modified Mencius protocol.
12
12
13
13
The cluster's Paxos algorithm employs a modified Mencius approach, removing batching and pipelining, making it similar to pure Paxos. Tests were conducted at various concurrency levels under a network latency of 10ms, as illustrated in the following figure:
Figure 4-24. Results of testing Group Replication with pure Paxos protocol
17
+
Figure 4-24. Results of testing Group Replication with modified Mencius protocol.
18
18
19
19
In a WAN testing scenario, the throughput remains nearly constant across different concurrency levels—50, 100, or 150—because the time MySQL takes to process TPC-C transactions is negligible compared to the network latency of 10ms. This network latency dominates the overall transaction time, making the impact of concurrency changes relatively insignificant.
20
20
@@ -26,9 +26,9 @@ This closely matches the test results above, where 0.45 is an empirical factor d
26
26
27
27

28
28
29
-
Figure 4-25. Insights into the pure Paxos protocol from packet capture data.
29
+
Figure 4-25. Insights into the modified Mencius protocol from packet capture data.
30
30
31
-
In the figure, the network latency between the two Paxos instances is approximately 10ms, matching the exact network delay. Numerous examples suggest that pure Paxos communication is inherently serial. In scenarios where network latency is the predominant factor, it acts as a single queue bottleneck. Consequently, regardless of concurrency levels, the throughput of pure Paxos is limited by this network latency.
31
+
In the figure, the network latency between the two Paxos instances is approximately 10ms, matching the exact network delay. Numerous examples suggest that Paxos communication is inherently serial. In scenarios where network latency is the predominant factor, it acts as a single queue bottleneck. Consequently, regardless of concurrency levels, the throughput of modified Mencius is limited by this network latency.
32
32
33
33
### 4.6.2 Multiple Queue Bottlenecks
34
34
@@ -100,10 +100,10 @@ To prevent performance degradation, controlling resource usage is crucial. For M
100
100
101
101
A practical transaction throttling mechanism for MySQL is as follows:
102
102
103
-
1.Before entering the transaction system, check if the number of concurrent processing threads exceeds the limit.
104
-
2.If the limit is exceeded, block the user thread until other threads activate this thread.
105
-
3.If the limit is not exceeded, allow the thread to proceed with processing within the transaction system.
106
-
4.Upon transaction completion, activate the first transaction in the waiting queue.
103
+
1. Before entering the transaction system, check if the number of concurrent processing threads exceeds the limit.
104
+
2. If the limit is exceeded, block the user thread until other threads activate this thread.
105
+
3. If the limit is not exceeded, allow the thread to proceed with processing within the transaction system.
106
+
4. Upon transaction completion, activate the first transaction in the waiting queue.
107
107
108
108
This approach helps maintain performance by controlling concurrency and managing resource usage effectively. The following figure illustrates the relationship between TPC-C throughput and concurrency under transaction throttling conditions, with 1000 warehouses.
Copy file name to clipboardExpand all lines: Chapter4_7.md
+14-14
Original file line number
Diff line number
Diff line change
@@ -14,16 +14,16 @@ The FLP impossibility theorem is valuable in problem-solving as it highlights th
14
14
15
15
The Mencius algorithm used in Group Replication addresses the FLP impossibility by using a failure detector oracle to bypass the result. Like Paxos, it relies on the failure detector only for liveness. Mencius requires that eventually, all and only faulty servers are suspected by the failure detector. This can be achieved by implementing failure detectors with exponentially increasing timeouts [32].
16
16
17
-
To avoid the problems posed by the FLP impossibility, careful design is needed. TCP, for example, addresses this with timeout retransmission and idempotent design, ensuring that even if duplicate messages are received due to transmission errors, they can be safely discarded.
17
+
To avoid problems caused by uncertainty, careful design is needed. TCP, for example, addresses this with timeout retransmission and idempotent design, ensuring that even if duplicate messages are received due to transmission errors, they can be safely discarded.
18
18
19
19
### 4.7.2 TCP/IP Protocol Stack
20
20
21
21
The Internet protocol suite, commonly known as TCP/IP, organizes the set of communication protocols used in the Internet and similar computer networks [45]. It provides end-to-end data communication, specifying how data should be packetized, addressed, transmitted, routed, and received. The suite is divided into four abstraction layers, each classifying related protocols based on their networking scope:
22
22
23
-
1.**Link Layer**: Handles communication within a single network segment (link).
24
-
2.**Internet Layer**: Manages internetworking between independent networks.
4.**Application Layer**: Enables process-to-process data exchange for applications.
27
27
28
28
An implementation of these layers for a specific application forms a protocol stack. The TCP/IP protocol stack is one of the most widely used globally, having operated successfully for many years since its design. The following figure illustrates how a client program interacts with a MySQL Server using the TCP/IP protocol stack.
29
29
@@ -33,9 +33,9 @@ Figure 4-34. A client program interacts with a MySQL Server using the TCP/IP pro
33
33
34
34
Due to the layered design of the TCP/IP protocol stack, a client program typically interacts only with the local TCP to access a remote MySQL server. This design is elegant in its simplicity:
35
35
36
-
1.**Client-Side TCP**: Handles sending SQL queries end-to-end to the remote MySQL server. It manages retransmission if packets are lost.
37
-
2.**Server-Side TCP**: Receives the SQL queries from the client-side TCP and forwards them to the MySQL server application. After processing, it sends the response back through its TCP stack.
38
-
3.**Routing and Forwarding**: TCP uses the IP layer for routing and forwarding, while the IP layer relies on the data link layer for physical transmission within the same network segment.
36
+
1.**Client-Side TCP**: Handles sending SQL queries end-to-end to the remote MySQL server. It manages retransmission if packets are lost.
37
+
2.**Server-Side TCP**: Receives the SQL queries from the client-side TCP and forwards them to the MySQL server application. After processing, it sends the response back through its TCP stack.
38
+
3.**Routing and Forwarding**: TCP uses the IP layer for routing and forwarding, while the IP layer relies on the data link layer for physical transmission within the same network segment.
39
39
40
40
Although TCP ensures reliable transmission, it cannot guarantee that messages will always reach their destination due to potential network anomalies. For example, SQL requests might be blocked by a network firewall, preventing them from reaching the MySQL server. In such cases, the client application might not receive a response, leading to uncertainty about whether the request was processed or still in transit.
A flexible understanding of state transitions is crucial for troubleshooting MySQL network problems. For example:
53
53
54
-
-**CLOSE_WAIT State**: A large number of *CLOSE_WAIT* states on the server indicates that the application did not close connections promptly or failed to initiate the close process, causing connections to linger in this state.
55
-
-**SYN_RCVD State**: Numerous *SYN_RCVD* states may suggest a SYN flood attack, where an excessive number of SYN requests overwhelm the server's capacity to handle them effectively.
54
+
-**CLOSE_WAIT State**: A large number of *CLOSE_WAIT* states on the server indicates that the application did not close connections promptly or failed to initiate the close process, causing connections to linger in this state.
55
+
-**SYN_RCVD State**: Numerous *SYN_RCVD* states may suggest a SYN flood attack, where an excessive number of SYN requests overwhelm the server's capacity to handle them effectively.
56
56
57
57
Understanding these state transitions helps in diagnosing and addressing network-related problems more effectively.
58
58
@@ -108,7 +108,7 @@ Why does pure Paxos perform poorly in WAN environments? Refer to the packet capt
108
108
109
109
Figure 4-41. Insights into the pure Paxos protocol from packet capture data.
110
110
111
-
From the figure, it is evident that the delay between two Paxos instances is around 10ms, matching the network latency. The low throughput of pure Paxos stems from its serial interaction nature, where network latency primarily determines throughput.
111
+
The figure clearly shows that when both pipelining and batching are disabled, referred to here as pure Paxos, throughput drops significantly to just 2833 tpmC. The low throughput of pure Paxos stems from its serial interaction nature, where network latency primarily determines throughput.
112
112
113
113
In general, the test conclusions of pipelining and batching are consistent with the conclusions in the following paper [48]:
The figure categorizes network partitions into three types:
140
140
141
-
1.**Complete Network Partition (a)**: Two partitions are completely disconnected from each other, widely recognized as a complete network partition.
142
-
2.**Partial Network Partition (b)**: Group 1 and Group 2 are disconnected from each other, but Group 3 can still communicate with both. This is termed a partial network partition.
143
-
3.**Simplex Network Partition (c)**: Communication is possible in one direction but not the other, known as a simplex network partition.
141
+
1.**Complete Network Partition (a)**: Two partitions are completely disconnected from each other, widely recognized as a complete network partition.
142
+
2.**Partial Network Partition (b)**: Group 1 and Group 2 are disconnected from each other, but Group 3 can still communicate with both. This is termed a partial network partition.
143
+
3.**Simplex Network Partition (c)**: Communication is possible in one direction but not the other, known as a simplex network partition.
144
144
145
145
The most complex type is the partial network partition. Partial partitions isolate a set of nodes from some, but not all, nodes in the cluster, leading to a confusing system state where nodes disagree on whether a server is up or down. These disagreements are poorly understood and tested, even by expert developers [6].
0 commit comments