Skip to content

Expected behavior of InstallFailureSignalHandler when process is PID 1 #1153

Open
@dambrosio

Description

@dambrosio

Hello,
Thanks for all your work on the glog library, we utilize it extensively in our code base!

I have a quick question about the expected behavior of the InstallFailureSignalHandler function when utilized in a binary that is the PID 1 process (i.e. the binary is the entrypoint of a Docker container).

I noticed that when signals (SIGSEGV, SIGABRT, etc.) are triggered that the InstallFailureSignalHandler handles (kFailureSignals), the FailureSignalHandler function dumps the stack trace successfully but the application hangs indefinitely.

After some debugging, it seems that it is due to the kill(getpid(), signal_number)(ref) behavior of the OS call in the InvokeDefaultSignalHandler function.

My understanding of that function is that it "un-registers" the given user signal to let the OS handle the signal through it's default signal handler, through the use of the kill() call.

Digging around, I found that for PID 1 processes in Linux, there are no default signal handlers and the process wont receive the signal provided by kill():

PID 1 processes in Linux do not have any default signal handlers and as a result will not receive and propagate signals.

This will ultimately leave the process running in-definitely.

I added a patch to signalhandler.cc for our use case to handle this situation:

diff --git a/src/signalhandler.cc b/src/signalhandler.cc
index b6d6e25..338ce00 100644
--- a/src/signalhandler.cc
+++ b/src/signalhandler.cc
@@ -36,6 +36,7 @@
 #include "symbolize.h"
 #include "glog/logging.h"
 
+#include <cstdlib>
 #include <csignal>
 #include <ctime>
 #ifdef HAVE_UCONTEXT_H
@@ -252,6 +253,13 @@ void InvokeDefaultSignalHandler(int signal_number) {
   sigemptyset(&sig_action.sa_mask);
   sig_action.sa_handler = SIG_DFL;
   sigaction(signal_number, &sig_action, NULL);
+  if (1 == getpid()) {
+    // From Advanced Bash-Scripting Guide
+    // Fatal error signal "n" should return with 128 + n
+    // Where n is the fatal error signal
+    // https://tldp.org/LDP/abs/html/exitcodes.html
+    std::_Exit(128 + signal_number);
+  }
   kill(getpid(), signal_number);
 #elif defined(OS_WINDOWS)
   signal(signal_number, SIG_DFL);

But wanted to understand if this PID 1 behavior is expected with glog, or if others have had to do something similar?

FYI, we are using v0.5.0 but the InvokeDefaultSignalHandler function is the same in the latest version of glog.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions