[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wj9KWfs799xU5eW0J_hkee52C5kvFFmBV-A+vN7qNWnjA@mail.gmail.com>
Date: Thu, 23 Jul 2020 11:22:44 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Oleg Nesterov <oleg@...hat.com>
Cc: Hugh Dickins <hughd@...gle.com>, Michal Hocko <mhocko@...nel.org>,
Linux-MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Michal Hocko <mhocko@...e.com>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page
On Thu, Jul 23, 2020 at 11:01 AM Oleg Nesterov <oleg@...hat.com> wrote:
>
> > + *
> > + * We _really_ should have a "list_del_init_careful()" to
> > + * properly pair with the unlocked "list_empty_careful()"
> > + * in finish_wait().
> > + */
> > + smp_mb();
> > + list_del_init(&wait->entry);
>
> I think smp_wmb() would be enough, but this is minor.
Well, what we _really_ want (and what that comment is about) would be
got that list_del_init_careful() to use a "smp_store_release()" for
the last store, and then "list_empty_careful()" would use a
"smp_load_acquire()" for the corresponding first load.
On x86, that's free. On most other architectures, it's the minimal
ordering requirement.
And no, I don't think "smp_wmb()" is technically enough.
With crazy memory ordering, one of the earlier *reads* (eg loading
"wait->private" when waking things up) could have been delayed until
after the stores that initialize the list - and thus read stack
contents from another process after it's been released and re-used.
Does that happen in reality? No. There are various conditionals in
there which means that the stores end up being gated on the loads and
cannot actually be re-ordered, but it's the kind of subtley
So we actually do want to constrain all earlier reads and writes wrt
the final write. Which is exactly what "smp_store_release()" does.
But with our current list_empty_careful(), the smp_mb() is I think
technically sufficient.
> We need a barrier between "wait->flags |= WQ_FLAG_WOKEN" and list_del_init(),
See above: we need more than just that write barrier, although in
_practice_ you're right, and the other barriers actually all already
exist and are part of wake_up_state().
So the smp_mb() is unnecessary, and in fact your smp_wmb() would be
too. But I left it there basically as "documentation".
> But afaics we need another barrier, rmb(), in wait_on_page_bit_common() fo
> the case when wait->private was not blocked; we need to ensure that if
> finish_wait() sees list_empty_careful() == T then we can't miss WQ_FLAG_WOKEN.
Again, this is what a proper list_empty_careful() with a
smp_load_acquire() would have automatically gotten for us.
But yes, I think that without that, and with the explicit barriers, we
need an smp_rmb() after the list_empty_careful().
I really think it should be _in_ list_empty_careful(), though. Or
maybe finish_wait(). Hmm.
Because looking at all the other finish_wait() uses, the fact that the
waitqueue _list_ is properly ordered isn't really a guarantee of the
rest of the stack space is.
In practice, it will be, but I think this lack of serialization is a
potential real bug elsewhere too.
(Obviously none of this would show on x86, where we already *get* that
smp_store_release/smp_load_acquire behavior for the existing
list_del_init()/list_empty_careful(), since all stores are releases,
and all loads are acquires)
So I think that is a separate issue, generic to our finish_wait() uses.
Linus
Powered by blists - more mailing lists