[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200723124749.GA7428@redhat.com>
Date: Thu, 23 Jul 2020 14:47:50 +0200
From: Oleg Nesterov <oleg@...hat.com>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Hugh Dickins <hughd@...gle.com>, Michal Hocko <mhocko@...nel.org>,
Linux-MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Michal Hocko <mhocko@...e.com>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page
On 07/22, Linus Torvalds wrote:
>
> Comments? Oleg, this should fix the race you talked about too.
Yes.
I still can't convince myself thatI fully understand this patch but I see
nothing really wrong after a quick glance...
> + * We can no longer use 'wait' after we've done the
> + * list_del_init(&wait->entry),
Yes, but see below,
> + * the target may decide it's all done with no
> + * other locking, and 'wait' has been allocated on
> + * the stack of the target.
> */
> - if (test_bit(key->bit_nr, &key->page->flags))
> - return -1;
> + target = wait->private;
> + smp_mb();
>
> - return autoremove_wake_function(wait, mode, sync, key);
> + /*
> + * Ok, we have successfully done what we're waiting for.
> + *
> + * Now unconditionally remove the wait entry, so that the
> + * waiter can use that to see success or not.
> + *
> + * We _really_ should have a "list_del_init_careful()"
> + * to properly pair with an unlocked "list_empty_careful()".
> + */
> + list_del_init(&wait->entry);
> +
> + /*
> + * Theres's another memory barrier in the wakup path, that
> + * makes sure the wakup happens after the above is visible
> + * to the target.
> + */
> + wake_up_state(target, mode);
We can no longer use 'target'. If it was already woken up it can notice
list_empty_careful(), return without taking q->lock, and exit.
Of course, this is purely theoretical... rcu_read_lock() should help
but perhaps we can avoid it somehow?
Say, can't we abuse WQ_FLAG_WOKEN?
wake_page_function:
wait->flags |= WQ_FLAG_WOKEN;
wmb();
autoremove_wake_function(...);
wait_on_page_bit_common:
for (;;) {
set_current_state();
if (wait.flags & WQ_FLAG_WOKEN)
break;
schedule();
}
finish_wait();
rmb();
return wait.flags & WQ_FLAG_WOKEN ? 0 : -EINTR;
Another (cosmetic) problem is that wake_up_state(mode) looks confusing.
It is correct but only because we know that mode == TASK_NORMAL and thus
wake_up_state() can'fail if the target is still blocked.
> + spin_lock_irq(&q->lock);
> + SetPageWaiters(page);
> + if (!trylock_page_bit_common(page, bit_nr, behavior))
> + __add_wait_queue_entry_tail(q, wait);
do we need SetPageWaiters() if trylock() succeeds ?
Oleg.
Powered by blists - more mailing lists