linux-kernel - Re: ERESTARTSYS escaping from sem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <4AD33A4D.4070006@us.ibm.com>
Date:	Mon, 12 Oct 2009 07:16:45 -0700
From:	Darren Hart <dvhltc@...ibm.com>
To:	Jeremy Leibs <leibs@...lowgarage.com>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	Blaise Gassend <blaise@...lowgarage.com>,
	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>
Subject: Re: ERESTARTSYS escaping from sem_wait with RTLinux patch

Jeremy Leibs wrote:
> On Sat, Oct 10, 2009 at 10:59 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
>> Blaise,
>>
>> On Sat, 10 Oct 2009, Blaise Gassend wrote:
>>> 1) Where is the ERESTARTSYS being prevented from getting to user space?
>>>
>>> The only likely place I see for preventing ERESTARTSYS from escaping to
>>> user space is in arch/*/kernel/signal*.c. However, I don't see how the
>>> code there is being called if there no signal pending. Is that a path
>>> for ERESTARTSYS to escape from the kernel?
>>>
>>> The following comment in kernel/futex.h in futex_wait makes me wonder if
>>> two threads are getting marked as ERESTARTSYS. The first one to leave
>>> the kernel processes the signal and restarts. The second one doesn't
>>> have a signal to handle, so it returns to user space without getting
>>> into signal*.c and wreaks havoc.
>>>
>>>     (...)
>>>         /*
>>>          * We expect signal_pending(current), but another thread may
>>>          * have handled it for us already.
>>>          */
>>>         if (!abs_time)
>>>                 return -ERESTARTSYS;
>>>     (...)
>> If the task is woken by a signal, then the task private flag
>> TIF_SIGPENDING is set, but in case of a process wide signal the signal
>> might have been handled by another thread of the same process before
>> that thread reaches the signal handling code, but then ERESTARTSYS is
>> handled gracefully. So you seem to trigger a code path which does not
>> go through do_signal.
>>
>>> 2) Why would this be happening only with RT kernels?
>> Slightly different timing and locking semantics.
>>
>>> 3) Any suggestions on the best place to patch/workaround this?
>>>
>>> My understanding is that if I was to treat ERESTARTSYS as an EAGAIN,
>>> most applications would be perfectly happy. Would bad things happen if I
>>> replaced the ERESTARTSYS in futex_wait with an EAGAIN?
>> No workarounds please. We really want to know what's wrong.
>>
>> Two things to look at:
>>
>> 1) Does that happen with 2.6.31.2-rt13 as well ?
>>
>> 2) Add a check to the code path where ERESTARTSYS is returned:
>>
>>   if (!signal_pending(current))
>>      printk(KERN_ERR ".....");
>>
> 
> Ok, in 2.6.31.2-rt13, I modified futex.c as:
> -----
>         /*
>          * We expect signal_pending(current), but another thread may
>          * have handled it for us already.
>          */
>         ret = -ERESTARTSYS;
>         if (!abs_time)
>           {
>             if (!signal_pending(current))
>               printk(KERN_ERR ".....");
>             goto out_put_key;
>           }
> -----
> 
> Then when I cause the crash:
> 
> leibs@c1:~$ python threadprocs8.py
> sem_wait: Unknown error 512
> Segmentation fault
> 
> dmesg shows me the corresponding:
> [   82.232999] .....
> [   82.233177] python[2834]: segfault at 48 ip 00000000004b0177 sp
> 00007f9429788ad8 error 4 in python2.6[400000+216000]


OK, so I suspect one of two things.

1) Recent changes to futex.c have somehow created a wakeup race and
    unqueue_me() doesn't detect it was woken with FUTEX_WAKE, then falls
    out through the ERESTARTSYS path.

2) Recent changes have exposed an existing race in unqueue_me().

I'll do some runs on my 8-way systems and see if I can:
o Identify the guilty patch
o Identify the race in question

Thanks for the test case! Now... why is sem_wait() being used in a timer 
call....

-- 
Darren Hart
IBM Linux Technology Center
Real-Time Linux Team
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/