[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <20070901082638.3C1124D04CC@magilla.localdomain>
Date: Sat, 1 Sep 2007 01:26:38 -0700 (PDT)
From: Roland McGrath <roland@...hat.com>
To: Oleg Nesterov <oleg@...sign.ru>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: PATCH? fix SIGNAL_STOP_DEQUEUED vs SIGCONT race
> Move the setting of SIGNAL_STOP_DEQUEUED from dequeue_signal() to
> get_signal_to_deliver(), and set this flag only if we are really going to stop.
> This also simplifies the code and makes the SIGNAL_STOP_DEQUEUED usage more
> understandable.
It looks like a nice cleanup to me.
> However, this changes the behaviour when the task is ptraced. If the debugger
> doesn't clear ->exit_code, SIGSTOP always succeeds after ptrace_stop(), even
> if SIGCONT was sent in between. I can't decide whether this change is good
> or bad, hopefully Roland can clarify.
Hmm. I think this is bad.
First, considering only the single-threaded case, there are debugger vs
SIGCONT races. Someone does kill(pid,SIGSTOP);kill(pid,SIGCONT); while pid
is debugged. The mandate for end user behavior here is that pid cannot
wind up sitting in job control stop in the end. Say the debugger is
e.g. strace, simply printing every signal and passing it through.
So say it goes:
T K D
merrily running ... blocked in wait4
kill(K, SIGSTOP)
dequeue SIGSTOP
-> ptrace_stop
wait4 -> K,{SIGSTOP}
kill(K, SIGCONT)
PTRACE_CONT,K,SIGSTOP
do_signal_stop(SIGSTOP)
wait4 -> K,{SIGSTOP}
The debugger did the obvious thing: just pass through what it got.
The killer did something POSIX guarantees leaves T running.
T is in job control stop, with a pending SIGCONT (normally impossible).
It's not the end of the world, since the next SIGKILL or SIGCONT will
always wake it up. But it's not right.
There are related issues with multi-threaded job control stop and with
suddenly killing the debugger. I could get into those in detail. But I
think the case above illustrates why we need a stop signal pending
consideration by the debugger to be like a stop signal pending locking for
the is_current_pgrp_orphaned() check when a SIGCONT/SIGKILL comes in between.
It's still probably a worthwhile cleanup to have the logic only in
get_signal_to_deliver, and to fix the problem you cited. It will take only
a little extra code to handle the ptrace case too, i.e.
if (sig_kernel_stop(signr) &&
current->sighand->action[signr-1] == SIG_DFL &&
!(current->signal->flags & SIGNAL_GROUP_EXIT))
current->signal->flags |= SIGNAL_STOP_DEQUEUED;
ptrace_stop(signr, signr, info);
if (sig_kernel_stop(signr) && current->exit_code == signr &&
!(current->signal->flags & SIGNAL_STOP_DEQUEUED) &&
current->sighand->action[signr-1] == SIG_DFL)
current->exit_code = 0;
maybe.
Thanks,
Roland
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists