[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.23.453.2007011457250.1942091@chino.kir.corp.google.com>
Date: Wed, 1 Jul 2020 15:01:14 -0700 (PDT)
From: David Rientjes <rientjes@...gle.com>
To: Dave Hansen <dave.hansen@...el.com>
cc: Dave Hansen <dave.hansen@...ux.intel.com>,
linux-kernel@...r.kernel.org, linux-mm@...ck.org,
alex.shi@...ux.alibaba.com, dwagner@...e.de, tobin@...nel.org,
cl@...ux.com, akpm@...ux-foundation.org, ying.huang@...el.com,
dan.j.williams@...el.com, cai@....pw
Subject: Re: [PATCH 3/3] mm/vmscan: replace implicit RECLAIM_ZONE checks with
explicit checks
On Wed, 1 Jul 2020, Dave Hansen wrote:
> On 7/1/20 1:04 PM, Ben Widawsky wrote:
> >> +static inline bool node_reclaim_enabled(void)
> >> +{
> >> + /* Is any node_reclaim_mode bit set? */
> >> + return node_reclaim_mode & (RECLAIM_ZONE|RECLAIM_WRITE|RECLAIM_UNMAP);
> >> +}
> >> +
> >> extern void check_move_unevictable_pages(struct pagevec *pvec);
> >>
> >> extern int kswapd_run(int nid);
> > If a user writes a bit that isn't a RECLAIM_* bit to vm.zone_reclaim_mode
> > today, it acts as though RECLAIM_ZONE is enabled: we try to reclaim in
> > zonelist order before falling back to the next zone in the page allocator.
> > The sysctl doesn't enforce any max value :/ I dont know if there is any
> > such user, but this would break them if there is.
> >
> > Should this simply be return !!node_reclaim_mode?
>
> You're right that there _could_ be a user-visible behavior change here.
> But, if there were a change it would be for a bit which wasn't even
> mentioned in the documentation. Somebody would have had to look at the
> doc mentioning 1,2,4 and written an 8. If they did that, they're asking
> for trouble because we could have defined the '8' bit to do nasty things
> like auto-demote all your memory. :)
>
> I'll mention it in the changelog, but I still think we should check the
> actual, known bits rather than check for 0.
>
> BTW, in the hardware, they almost invariably make unused bits "reserved"
> and do mean things like #GP if someone tries to set them. This is a
> case where the kernel probably should have done the same. It would have
> saved us the trouble of asking these questions now. Maybe we should
> even do that going forward.
>
Maybe enforce it in a sysctl handler so the user catches any errors, which
would be better than silently accepting some policy that doesn't exist?
RECLAIM_UNMAP and/or RECLAIM_WRITE should likely get -EINVAL if attempted
to be set without RECLAIM_ZONE as well: they are no-ops without
RECLAIM_ZONE. This would likely have caught something wrong with commit
648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE") if it
would have already been in place.
I don't feel strongly about this, so feel free to ignore.
Powered by blists - more mailing lists