linux-kernel - Re: ERESTARTSYS escaping from sem

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Tue, 13 Oct 2009 01:56:44 -0700
From:	Blaise Gassend <blaise@...lowgarage.com>
To:	Darren Hart <dvhltc@...ibm.com>
Cc:	Jeremy Leibs <leibs@...lowgarage.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Peter Zijlstra <peterz@...radead.org>,
	"lkml," <linux-kernel@...r.kernel.org>
Subject: Re: ERESTARTSYS escaping from sem_wait with RTLinux patch

> When 3495 finally get's to run and complete it's futex_wake() call, the task
> still needs to be woken, so we wake it - but now it's enqueued with a different
> futex_q, which now has a valid lock_ptr, so upon wake-up we expect a signal!
> 
> OK, I believe this establishes root cause.  Now to come up with a fix...

Wow, good work Darren! You definitely have more experience than I do at
tracking down these in-kernel races, and I'm glad we have you looking at
this. I'm snarfing up useful techniques from your progress emails.

So if I understand correctly, there is a race between wake_futex and a
timeout (or a signal, but AFAIK when my python example is running
steady-state there are no signals). The problem occurs when wake_futex
gets preempted half-way through, after it has NULLed lock_ptr, but
before it has woken up the waiter. If a timeout/signal wakes the waiter,
and the waiter runs and starts waiting again before the waker resumes,
then the waker will end up waking the waiter a second time, without the
lock_ptr for the second wait being NULLified. This causes the waiter to
mis-interpret what woke it and leads to the fault we observed.

Now I am wondering if the workaround described here
http://markmail.org/message/at2s337yz3wclaif
for what seems like this same problem isn't actually a legitimate fix.
It ends up looking something like this: (lines 3 and 4 are new)

  ret = -ERESTARTSYS;
  if (!abs_time) {
    if (!signal_pending(current))
      set_tsk_thread_flag(current,TIF_SIGPENDING);
    goto out_put_key;
  }

I think this works because the only bad thing that is happening in the
scenario you worked out is that the waiter failed to detect that the
second wake was spurious, and instead interpreted it as a signal.
However, the waiter can easily detect that the wakeup is spurious:
1) a futex wakeup can be detected by unqueue_me returning 0
2) a timeout can be detected by looking at to->task
3) a signal can be detected by calling signal_pending

futex_wait is already doing 1) and 2). If it additionally did 3), which
I think it is an inexpensive check, then it could detect the spurious
wakeup, and restart the wait, either by doing a
set_tsk_thread_flag(current,TIF_SIGPENDING) and relying on the signal
handling to restart the syscall as in the patch above, or by looping
back to somewhere around the futex_wait_setup call.

All the other fix ideas that come to mind end up slowing down the common
case by introducing additional locking in futex_wakeup, and hence seem
less desirable.

At this point I'll admit that I'm out of my depth and see what you
think...

Blaise

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/