Skip to content

Commit fb3d05b

Browse files
authored
Fixed random SlaveRecoveryTest.PingTimeoutDuringRecovery test failure. (#436)
This test would randomly fail with: ``` 18:16:59 3: F0501 17:16:59.192818 19175 slave.cpp:1445] Check failed: state == DISCONNECTED || state == RUNNING || state == TERMINATING RECOVERING ``` The cause was that the test re-starts the slave with the same PID, which means that timers started by the previous slave process could fire while the new slave process was running. In this specific case, what happened is that the previous slave's ping timer would fire in the middle of recovery of the second slave instance, yielding this assertion. Fixed by cancelling the `pingTimer` in the slave destructor. Tested by running the test in a loop, while running a CPU-intensive workload - `stress-ng --cpu $(nproc)0` in parallel.
1 parent 8894191 commit fb3d05b

File tree

1 file changed

+2
-0
lines changed

1 file changed

+2
-0
lines changed

src/slave/slave.cpp

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -255,6 +255,8 @@ Slave::~Slave()
255255
// TODO(benh): Shut down executors? The executor should get an "exited"
256256
// event and initiate a shut down itself.
257257

258+
Clock::cancel(pingTimer);
259+
258260
foreachvalue (Framework* framework, frameworks) {
259261
delete framework;
260262
}

0 commit comments

Comments
 (0)