lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20110409134520.GA19651@redhat.com>
Date:	Sat, 9 Apr 2011 15:45:20 +0200
From:	Oleg Nesterov <oleg@...hat.com>
To:	Andrew Morton <akpm@...ux-foundation.org>,
	"Nikita V. Youshchenko" <nyoushchenko@...sta.com>,
	Alexander Kaliadin <akaliadin@...sta.com>,
	oishi.y@....yzk.co.jp
Cc:	linux-kernel@...r.kernel.org
Subject: Re: Likely race between sys_rt_sigtimedwait() and complete_signal()

Can't find the original email, replying to Andrew's fwd.

On 04/07, Andrew Morton wrote:
>
> Within project we are working on, we are facing a "rare" situation when
> setitimer() / sigwait() - based periodic task execution hangs. "Rare"
> means once per several hours for 1000 Hz timer.
>
> For hanged thread, cat /proc/pid/status shows
>
> ...
> State:	S (sleeping)
> ...
> SigPnd:	0000000000000000
> ShdPnd:	0000000000002000
> SigBlk:	0000000000000000
> ...
>
> and SysRq - T shows
>
> [<c015b1b0>] (__schedule+0x2fc/0x37c) from [<c015b7b8>]
> (schedule+0x1c/0x30)
> [<c015b7b8>] (schedule+0x1c/0x30) from [<c015b8c4>]
> (schedule_timeout+0x18/0x1dc)
> [<c015b8c4>] (schedule_timeout+0x18/0x1dc) from [<c004a084>]
> (sys_rt_sigtimedwait+0x1b4/0x288)
> [<c004a084>] (sys_rt_sigtimedwait+0x1b4/0x288) from [<c001cf00>]
> (ret_fast_syscall+0x0/0x28)

Is this thread the group leader?

> All other threads have SIGALRM blocked as they should, looking
> through /proc/X/status proves this.

Do they ever had SIGALRM unlblocked ?

> So for some reason, SIGALRM was successfully delivered by timer, bit was
> set in ShdPnd [I guess at the bottom of __send_signal()], but that still
> resulted somehow in thread going to schedule() and not waking.

Thanks for the detailed report.

There is an old, ancient problem which I constantly forget to fix.
It _can_ perfectly explain the hang, at least in theory. I'll try
to make the patch on Monday.



In short: if a thread T runs with SIGALRM unblocked while another
thread sleeps in sigtimedwait(), and then T blocks SIGALRM, the
signal can be "lost" as above.

Does your application do something like this? If not, then there
is another problem.



> This is on embedded system running vendor 2.6.31-based kernel, moving
> forward is unfortunately impossible because of hardware support issues.

If I make the patch for 2.6.31, any chance you can test it?

> However I guess the race we faced still exists in the current upstream
> kernel,

Yes, this is possible. OTOH, the bug can be anywhere, not necessarily in
signal.c, and it might be already fixed.

Oleg.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ