[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-id: <163529540259.8576.9186192891154927096@noble.neil.brown.name>
Date: Wed, 27 Oct 2021 11:43:22 +1100
From: "NeilBrown" <neilb@...e.de>
To: "Mel Gorman" <mgorman@...hsingularity.net>
Cc: "Andrew Morton" <akpm@...ux-foundation.org>,
"Theodore Ts'o" <tytso@....edu>,
"Andreas Dilger" <adilger.kernel@...ger.ca>,
"Darrick J . Wong" <djwong@...nel.org>,
"Matthew Wilcox" <willy@...radead.org>,
"Michal Hocko" <mhocko@...e.com>,
"Dave Chinner" <david@...morbit.com>,
"Rik van Riel" <riel@...riel.com>,
"Vlastimil Babka" <vbabka@...e.cz>,
"Johannes Weiner" <hannes@...xchg.org>,
"Jonathan Corbet" <corbet@....net>,
"Linux-MM" <linux-mm@...ck.org>,
"Linux-fsdevel" <linux-fsdevel@...r.kernel.org>,
"LKML" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v4 0/8] Remove dependency on congestion_wait in mm/
On Sat, 23 Oct 2021, Mel Gorman wrote:
> On Fri, Oct 22, 2021 at 10:26:30PM +1100, NeilBrown wrote:
> > On Fri, 22 Oct 2021, Mel Gorman wrote:
> > > On Fri, Oct 22, 2021 at 12:15:10PM +1100, NeilBrown wrote:
> > >
> > > > In general, I still don't like the use of wake_up_all(), though it won't
> > > > cause incorrect behaviour.
> > > >
> > >
> > > Removing wake_up_all would be tricky.
> >
> > I think there is a misunderstanding. Removing wake_up_all() is as
> > simple as
> > s/wake_up_all/wake_up/
> >
> > If you used prepare_to_wait_exclusive(), then wake_up() would only wake
> > one waiter, while wake_up_all() would wake all of them.
> > As you use prepare_to_wait(), wake_up() will wake all waiters - as will
> > wake_up_all().
> >
>
> Ok, yes, there was a misunderstanding. I thought you were suggesting a
> move to exclusive wakeups. I felt that the wake_up_all was explicit in
> terms of intent and that I really meant for all tasks to wake instead of
> one at a time.
Fair enough. Thanks for changing it :-)
But this prompts me to wonder if exclusive wakeups would be a good idea
- which is a useful springboard to try to understand the code better.
For VMSCAN_THROTTLE_ISOLATED they probably are.
One pattern for reliable exclusive wakeups is for any thread that
received a wake-up to then consider sending a wake up.
Two places receive VMSCAN_THROTTLE_ISOLATED wakeups and both then call
too_many_isolated() which - on success - sends another wakeup - before
the caller has had a chance to isolate anything. If, instead, the
wakeup was sent sometime later, after pages were isolated by before the
caller (isoloate_migratepages_block() or shrink_inactive_list())
returned, then we would get an orderly progression of threads running
through that code.
For VMSCAN_THROTTLE_WRITEBACK is a little less straight forward.
There are two different places that wait for the wakeup, and a wake_up
is sent to all waiters after a time proportional to the number of
waiters. It might make sense to wake one thread per time unit?
That might work well for do_writepages - every SWAP_CLUSTER_MAX writes
triggers one wakeup.
I'm less sure that it would work for shrink_node(). Maybe the
shrink_node() waiters could be non-exclusive so they get woken as soon a
SWAP_CLUSTER_MAX writes complete, while do_writepages are exclusive and
get woken one at a time.
For VMSCAN_THROTTLE_NOPROGRESS .... I don't understand.
If one zone isn't making "enough" progress, we throttle before moving on
to the next zone. So we delay processing of the next zone, and only
indirectly delay re-processing of the current congested zone.
Maybe it make sense, but I don't see it yet. I note that the commit
message says "it's messy". I can't argue with that!
I'll follow up with patches to clarify what I am thinking about the
first two. I'm not proposing the patches, just presenting them as part
of improving my understanding.
Thanks,
NeilBrown
Powered by blists - more mailing lists