lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 9 Dec 2019 10:18:13 +0100
From:   Ingo Molnar <mingo@...nel.org>
To:     Linus Torvalds <torvalds@...ux-foundation.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        Juri Lelli <juri.lelli@...hat.com>,
        Vincent Guittot <vincent.guittot@...aro.org>,
        Miklos Szeredi <miklos@...redi.hu>,
        Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Felipe Balbi <balbi@...nel.org>,
        Oleg Nesterov <oleg@...hat.com>
Subject: [RFC PATCH] sched/wait: Make interruptible exclusive waitqueue
 wakeups reliable


* Linus Torvalds <torvalds@...ux-foundation.org> wrote:

> The reason it is buggy is that wait_event_interruptible_exclusive()
> does this (inside the __wait_event() macro that it expands to):
> 
>                 long __int = prepare_to_wait_event(&wq_head, &__wq_entry, state);
> 
>                 if (condition)
>                         break;
>                 if (___wait_is_interruptible(state) && __int) {
>                         __ret = __int;
>                         goto __out;
> 
> and the thing is, if does that "__ret = __int" case and returns
> -ERESTARTSYS, it's possible that the wakeup event has already been
> consumed, because we've added ourselves as an exclusive writer to the
> queue. So it _says_ it was interrupted, not woken up, and the wait got
> cancelled, but because we were an exclusive waiter, we might be the
> _only_ thing that got woken up, and the wakeup basically got forgotten
> - all the other exclusive waiters will remain waiting.

So the place that detects interruption is prepare_to_wait_event():

long prepare_to_wait_event(struct wait_queue_head *wq_head, struct wait_queue_entry *wq_entry, int state)
{
        unsigned long flags;
        long ret = 0;

        spin_lock_irqsave(&wq_head->lock, flags);
        if (signal_pending_state(state, current)) {
                /*
                 * Exclusive waiter must not fail if it was selected by wakeup,
                 * it should "consume" the condition we were waiting for.
                 *
                 * The caller will recheck the condition and return success if
                 * we were already woken up, we can not miss the event because
                 * wakeup locks/unlocks the same wq_head->lock.
                 *
                 * But we need to ensure that set-condition + wakeup after that
                 * can't see us, it should wake up another exclusive waiter if
                 * we fail.
                 */
                list_del_init(&wq_entry->entry);
                ret = -ERESTARTSYS;
        } else {
                if (list_empty(&wq_entry->entry)) {
                        if (wq_entry->flags & WQ_FLAG_EXCLUSIVE)
                                __add_wait_queue_entry_tail(wq_head, wq_entry);
                        else
                                __add_wait_queue(wq_head, wq_entry);
                }
                set_current_state(state);
        }
        spin_unlock_irqrestore(&wq_head->lock, flags);

        return ret;
}

I think we can indeed lose an exclusive event here, despite the comment 
that argues that we shouldn't: if we were already removed from the list 
then list_del_init() does nothing and loses the exclusive event AFAICS.

This logic was introduced in this commit 3 years ago:

  b1ea06a90f52: ("sched/wait: Avoid abort_exclusive_wait() in ___wait_event()")
  eaf9ef52241b: ("sched/wait: Avoid abort_exclusive_wait() in __wait_on_bit_lock()")

Before that commit we simply did this:

 long prepare_to_wait_event(wait_queue_head_t *q, wait_queue_t *wait, int state)
 {
        unsigned long flags;
-
-       if (signal_pending_state(state, current))
-               return -ERESTARTSYS;

Which was safe in the sense that it didn't touch the waitqueue in case of 
interruption. Then it used abort_exclusive_wait(), which got removed in 
eaf9ef52241b, to pass on any leftover wakeups:

-void abort_exclusive_wait(wait_queue_head_t *q, wait_queue_t *wait, void *key)
-{
-       unsigned long flags;
-
-       __set_current_state(TASK_RUNNING);
-       spin_lock_irqsave(&q->lock, flags);
-       if (!list_empty(&wait->task_list))
-               list_del_init(&wait->task_list);
-       else if (waitqueue_active(q))
-               __wake_up_locked_key(q, TASK_NORMAL, key);
-       spin_unlock_irqrestore(&q->lock, flags);
-}
-EXPORT_SYMBOL(abort_exclusive_wait);

Note how the wakeup is passed along to another exclusive waiter (if any) 
via the __wake_up_locked_key() call.

Oleg?

> I dunno. Maybe this is fundamental to the interface? We do not have a 
> lot of users that mix "interruptible" and "exclusive". In fact, I see 
> only two cases that care about the return value, but at least the fuse 
> use does seem to have exactly this problem with potentially lost 
> wakeups because the ERESTARTSYS case ends up eating a wakeup without 
> doing anything about it.

I don't think it's fundamental to the interface.

In fact I'd argue that it's fundamental to the interface to *not* lose 
exclusive events.

> Looks like the other user - the USB gadget HID thing - also has this
> buggy pattern of believing the return value, and losing a wakeup
> event.
> 
> Adding Miklos and Felipe to the cc just because of the fuse and USB
> gadget potential issues, but this is mainly a scheduler maintainer
> question.
> 
> It's possible that I've misread the wait-event code. PeterZ?

I think your analysis is correct, and I think the statistical evidence is 
also overwhelming that the interface is buggy: if we include your new 
usecase then 3 out of 3 attempts got it wrong. :-)

So I'd argue we should fix the interface and allow the 'simple' use of 
reliable interruptible-exclusive waitqueues - i.e. reintroduce the 
abort_exclusive_wait() logic?

Thanks,

	Ingo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ