[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080923163530.GA656@tv-sign.ru>
Date: Tue, 23 Sep 2008 20:35:30 +0400
From: Oleg Nesterov <oleg@...sign.ru>
To: Joe Korty <joe.korty@...r.com>
Cc: Roland McGrath <roland@...hat.com>, Jiri Kosina <jkosina@...e.cz>,
Andrew Morton <akpm@...ux-foundation.org>,
linux-kernel@...r.kernel.org
Subject: Re: [BUG, TEST PATCH] stallout race between SIGCONT and SIGSTOP
Sorry! I have to run avay right now, and I will be completely offline
tomorrow. I'll return on Thursday.
On 09/23, Joe Korty wrote:
>
> Since 2.6.25-git16, the Open POSIX Test Suite test sigaction/10-1 on
> occasion stalls out. A ^C breaks the test out of the stall.
>
> To see the problem, one must run the test in a loop. The stallout happens
> anywhere from 3 to approximately 60 iterations. To make the test runtime
> more bearable, I've been using a custom version that is 8x faster than
> the original, s/sleep/usleep/g + new sleep constants.
>
> The test in essence does 10 SIGSTOPs and SIGCONTs, interleaved, with a
> short delay between each SIGSTOP and SIGCONT, but none (other than the
> small delay of a printf) between each SIGCONT and SIGSTOP:
>
> for(i=0; i<10; i++) {
> printf("--> Sending SIGSTOP #%d\n", i);
> kill (pid, SIGSTOP);
> usleep(125000);
> printf("--> Sending SIGCONT #%d\n", i);
> kill (pid, SIGCONT);
> // usleep(125000); /* this is missing from the real 10-1 */
> }
>
> When the above commented-out usleep is enabled, the stallout disappears.
> If instead of adding a usleep, the printf's are removed, the test stalls
> out immediately.
Could you clarify? Do you mean that the task hangs in sys_kill() ?
Better yet, to avoid a possible confusion, could you please send me
the (modified) source code to re-produce the stall ?
> Therefore the problem has something to do with a SIGSTOP
> being issued 'too soon' after the issuance of a SIGCONT.
>
> Bisection shows that the problem was introduced by
>
> commit e442055193e4584218006e616c9bdce0c5e9ae5c
> Author: Oleg Nesterov <oleg@...sign.ru>
> Date: Wed Apr 30 00:52:44 2008 -0700
>
> This commit adds code that solves serious race problems by deferring the
> actual processing of SIGSTOP and SIGCONT to a later time. I suspect it
> is this deferring that is making SIGCONT sensitive to a SIGSTOP coming
> in too close on its heels.
>
> The following patch, not to be considered seriously,
Yes, the patch is not for production, but thanks a lot! I am sure it will
help to diagnose the problem.
Thanks Joe!
Oleg.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists