lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20080923163530.GA656@tv-sign.ru>
Date:	Tue, 23 Sep 2008 20:35:30 +0400
From:	Oleg Nesterov <oleg@...sign.ru>
To:	Joe Korty <joe.korty@...r.com>
Cc:	Roland McGrath <roland@...hat.com>, Jiri Kosina <jkosina@...e.cz>,
	Andrew Morton <akpm@...ux-foundation.org>,
	linux-kernel@...r.kernel.org
Subject: Re: [BUG, TEST PATCH] stallout race between SIGCONT and SIGSTOP

Sorry! I have to run avay right now, and I will be completely offline
tomorrow. I'll return on Thursday.

On 09/23, Joe Korty wrote:
>
> Since 2.6.25-git16, the Open POSIX Test Suite test sigaction/10-1 on
> occasion stalls out.  A ^C breaks the test out of the stall.
>
> To see the problem, one must run the test in a loop.  The stallout happens
> anywhere from 3 to approximately 60 iterations.  To make the test runtime
> more bearable, I've been using a custom version that is 8x faster than
> the original, s/sleep/usleep/g + new sleep constants.
>
> The test in essence does 10 SIGSTOPs and SIGCONTs, interleaved, with a
> short delay between each SIGSTOP and SIGCONT, but none (other than the
> small delay of a printf) between each SIGCONT and SIGSTOP:
>
>     for(i=0; i<10; i++) {
> 	printf("--> Sending SIGSTOP #%d\n", i);
> 	kill (pid, SIGSTOP);
> 	usleep(125000);
> 	printf("--> Sending SIGCONT #%d\n", i);
> 	kill (pid, SIGCONT);
> 	// usleep(125000); /* this is missing from the real 10-1 */
>     }
>
> When the above commented-out usleep is enabled, the stallout disappears.
> If instead of adding a usleep, the printf's are removed, the test stalls
> out immediately.

Could you clarify? Do you mean that the task hangs in sys_kill() ?

Better yet, to avoid a possible confusion, could you please send me
the (modified) source code to re-produce the stall ?

> Therefore the problem has something to do with a SIGSTOP
> being issued 'too soon' after the issuance of a SIGCONT.
>
> Bisection shows that the problem was introduced by
>
>     commit e442055193e4584218006e616c9bdce0c5e9ae5c
>     Author: Oleg Nesterov <oleg@...sign.ru>
>     Date:   Wed Apr 30 00:52:44 2008 -0700
>
> This commit adds code that solves serious race problems by deferring the
> actual processing of SIGSTOP and SIGCONT to a later time.  I suspect it
> is this deferring that is making SIGCONT sensitive to a SIGSTOP coming
> in too close on its heels.
>
> The following patch, not to be considered seriously,

Yes, the patch is not for production, but thanks a lot! I am sure it will
help to diagnose the problem.

Thanks Joe!

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ