[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080924155611.GA5334@tv-sign.ru>
Date: Wed, 24 Sep 2008 19:56:11 +0400
From: Oleg Nesterov <oleg@...sign.ru>
To: Joe Korty <joe.korty@...r.com>
Cc: Roland McGrath <roland@...hat.com>, Jiri Kosina <jkosina@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [BUG, TEST PATCH] stallout race between SIGCONT and SIGSTOP
On 09/24, Joe Korty wrote:
>
> On Wed, Sep 24, 2008 at 11:05:41AM -0400, Oleg Nesterov wrote:
> > Joe says:
> >> So it looks like the test is in error, not the kernel.
> >
> > and I am happy to agree.
> > I think sigaction/10-1.c should be fixed, please see the patch below.
>
> A year or two ago I sent to Intel some OpenPosixTestSuite fixes, and they
> were accepted. Send it in (to the people listed in the comments at the
> front of the .c file), hopefully they are still at Intel.
OK, thanks, will do.
> > I did the test patch to be sure:
> >
> > --- 26-rc2/kernel/signal.c~ 2008-09-20 20:37:52.000000000 +0400
> > +++ 26-rc2/kernel/signal.c 2008-09-24 18:43:34.000000000 +0400
> > @@ -808,7 +808,7 @@ static int send_signal(int sig, struct s
> > * exactly one non-rt signal, so that we can get more
> > * detailed information about the cause of the signal.
> > */
> > - if (legacy_queue(pending, sig))
> > + if (sig != SIGCHLD && legacy_queue(pending, sig))
> > return 0;
> > /*
> > * fast-pathed signals for kernel-internal things like SIGSTOP
> >
> > and now your test-case doesn't hang.
>
> Very interesting! I am not sure this is Posix conformant,
No, no, the patch is of course wrong, I did it only to check my
understanding.
> as Posix
> seems to say that posting a SIGSTOP or SIGCHLD clears out all pending
> SIGSTOPs or SIGCHLDs,
Hmm. Are you sure?
Anyway, this is not what Linux does. If a non-rt signal is pending, the
next signal with the same number is silently ignored. SIGCHLD too.
> Still it might be workable
Confused. Do you agree the kernel is not buggy?
To clarify, none of SIGCONTs/SIGSTOPs is lost. But the test-case assumes
that it must always receive SIGCHLD + CLD_STOPPED. This is not true because
SIGCHLD is not queueable, and we have another "stream" of SIGCHLDs which
carry CLD_CONTINUED.
For example, the "opposite" code
kill(SIGSTOP);
kill(SIGCONT);
wait_for_CLD_CONTINUED();
was always wrong, but
kill(SIGCONT);
kill(SIGSTOP);
wait_for_CLD_STOPPED();
happened to work before that commit. But please note that it is wrong
anyway. For example, if we have another sub-thread, we can miss
CLD_STOPPED even without the commit which changed the timing.
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists