Warning

This is a design page. It was used to design and discuss the initial implementation of the change. However, the state of this document does not necessarily correspond to the current state of the implementation since we do not keep this document up to date with further changes and bug fixes.

Common SIGCHLD handler

Related ticket(s): - https://pagure.io/SSSD/sssd/issue/1004

I took some inspiration in the SIGUSR1 signal handling in data_provider_be.c. The SIGUSR1 signal is apparently used to force offline behavior on providers.

DP backend enables providers to register callbacks for the online/offline event. I thought it would be a good idea to make SIGCHLD handling consistent with what is already in place.

For online/offline event, these functions are defined:

be_add_online_cb

be_run_online_cb

be_add_offline_cb

be_run_offline_cb

They give providers the option to register additional callbacks to handle these event in their own way. The list of callbacks is stored on the backend context (struct be_ctx).

However there is one difference between the SIGCHLD and SIGUSR1 scenarios: online/offline callbacks are called serially - always all of them. While the SIGCHLD handler has to invoke callbacks for the appropriate PIDs only. This means we can’t use the underlying callbacks handling functions already in place (be_run_cb and be_run_cb_step).

I propose creating new similar functions (be_run_sigchld_cb and be_run_sigchld_cb_step). They would work in a similar manner to the previously mentioned (be_run_cb and be_run_cb_step respectively) with the difference that:

  1. each step would check with waitpid first and invoke the callback only if the child has exited

  1. we would use tevent_immediate events instead of timers (as discussed on IRC with Stephen)

Advantages of this approach:

  1. consistent with online/offline callbacks for providers

  1. relatively easy to implement

  • pid_t pid

  • sss_child_cb_fn cb

  • void *pvt

  • struct sss_sigchild_ctx *sigchld_ctx

  • struct tevent_context *ev

  • hash_table_t *children

  • int options

This object should be initialized at process startup time. The hash_table should be initialized with sss_hash_create() to maintain talloc compatibility. This hash should be keyed by integer (the PID) and should contain struct sss_child_ctx * objects as its values. The options member should be a bitmask allowing WUNTRACED and/or WCONTINUED. The handler will ALWAYS add WNOHANG.

errno_t sss_child_register(TALLOC_CTX *memctx,
                           struct sss_sigchild_ctx *sigchld_ctx,
                           pid_t pid,
                           sss_child_fn_t cb,
                           void *pvt,
                           struct sss_child_ctx **child_ctx);

This function registers a callback with private data in a hash table contained within sigchld_ctx. It constructs a struct sss_child_ctx * consisting of the pid, cb and pvt. It will also create a destructor for this object which will remove the entry from the hash. This is so that it the consumer can choose when to stop monitoring the child (such as if the waitpid() call returned SIGSTOP/SIGCONT or other non-terminating results. It can also be used to programmatically change the callback at need.

void
sss_child_handler(struct tevent_context *ev,
                  struct tevent_signal *se,
                  int signum,
                  int count,
                  void *siginfo,
                  void *private_data);

This is the master SIGCHLD handler. It would be invoked any time that the process receives a SIGCHLD signal.

When the signal is removed, it should call waitpid(-1, &status, WNOHANG & sigchld_ctx->options); repeatedly until waitpid() returns 0. For each child received, the pid should be looked up in the hash table and the matching callback should be invoked.

typedef void (*sss_child_fn_t)(int pid, int wait_status, void *pvt);

Talloc_destructor to remove a struct sss_child_ctx * from the hash table of the struct sss_sigchild_ctx * that contains it.