[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHk-=wi=vuc6sdu0m9nYd3gb8x5Xgnc6=TH=DTOy7qU96rZ9nw@mail.gmail.com>
Date: Wed, 22 Jul 2020 15:10:44 -0700
From: Linus Torvalds <torvalds@...ux-foundation.org>
To: Hugh Dickins <hughd@...gle.com>
Cc: Michal Hocko <mhocko@...nel.org>, Linux-MM <linux-mm@...ck.org>,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Tim Chen <tim.c.chen@...ux.intel.com>,
Michal Hocko <mhocko@...e.com>
Subject: Re: [RFC PATCH] mm: silence soft lockups from unlock_page
On Wed, Jul 22, 2020 at 2:29 PM Hugh Dickins <hughd@...gle.com> wrote:
>
> -#define PAGE_WAIT_TABLE_BITS 8
> +#define PAGE_WAIT_TABLE_BITS 10
Well, that seems harmless even on small machines.
> + bool first_time = true;
> bool thrashing = false;
> bool delayacct = false;
> unsigned long pflags;
> @@ -1134,7 +1135,12 @@ static inline int wait_on_page_bit_commo
> spin_lock_irq(&q->lock);
>
> if (likely(list_empty(&wait->entry))) {
> - __add_wait_queue_entry_tail(q, wait);
> + if (first_time) {
> + __add_wait_queue_entry_tail(q, wait);
> + first_time = false;
> + } else {
> + __add_wait_queue(q, wait);
> + }
> SetPageWaiters(page);
> }
This seems very hacky.
And in fact, looking closer, I'd say that there are more serious problems here.
Look at that WQ_FLAG_EXCLUSIVE thing: non-exclusive waits should
always go at the head (because they're not going to steal the bit,
they just want to know when it got cleared), and exclusive waits
should always go at the tail (because of fairness).
But that's not at all what we do.
Your patch adds even more confusion to this nasty area.
And your third one:
> + if (ret)
> + woken++;
>
> - if (bookmark && (++cnt > WAITQUEUE_WALK_BREAK_CNT) &&
> + if (bookmark && (++cnt > WAITQUEUE_WALK_BREAK_CNT) && woken &&
I've got two reactions to this
(a) we should not need a new "woken" variable, we should just set a
high bit of "cnt" and make WAITQUEUE_WALK_BREAK_CNT contain that high
bit
(Tune "high bit" to whatever you want: it could be either the
_real_ high bit of the variable, or it could be something like "128",
which would mean that you'd break out after 128 non-waking entries).
(b) Ugh, what hackery and magic behavior regardless
I'm really starting to hate that wait_on_page_bit_common() function.
See a few weeks ago how the function looks buggy to begin with
https://lore.kernel.org/lkml/CAHk-=wjJA2Z3kUFb-5s=6+n0qbTs8ELqKFt9B3pH85a8fGD73w@mail.gmail.com/
and that never got resolved either (but probably never happens in practice).
Linus
Powered by blists - more mailing lists