[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200626075918.dj6ioaon5iuhtg6k@beryllium.lan>
Date: Fri, 26 Jun 2020 09:59:18 +0200
From: Daniel Wagner <dwagner@...e.de>
To: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: linux-kernel@...r.kernel.org, linux-mm@...ck.org,
ben.widawsky@...el.com, alex.shi@...ux.alibaba.com,
tobin@...nel.org, cl@...ux.com, akpm@...ux-foundation.org,
stable@...nel.org
Subject: Re: [PATCH] mm/vmscan: restore zone_reclaim_mode ABI
Hi Dave
On Thu, Jun 25, 2020 at 05:34:59PM -0700, Dave Hansen wrote:
>
> From: Dave Hansen <dave.hansen@...ux.intel.com>
>
> I went to go add a new RECLAIM_* mode for the zone_reclaim_mode
> sysctl. Like a good kernel developer, I also went to go update the
> documentation. I noticed that the bits in the documentation didn't
> match the bits in the #defines.
Drop the this paragraph from the commit message. It doesn't add
any necessart information.
Please have a look at
https://www.kernel.org/doc/html/latest/process/submitting-patches.html#describe-your-changes
> The VM evidently stopped caring about RECLAIM_ZONE at some point (or
> never cared) and the #define itself was later removed as a cleanup.
> Those things by themselves are fine.
>
> But, the _other_ bit locations also got changed. That's not OK because
> the bit values are documented to mean one specific thing and users
> surely rely on them meaning that one thing and not changing from
> kernel to kernel. The end result is that if someone had a script
> that did:
>
> sysctl vm.zone_reclaim_mode=1
>
> That script went from doing nothing to writing out pages during
> node reclaim after the commit in question. That's not great.
>
> Put the bits back the way they were and add a comment so something
> like this is a bit harder to do again. Update the documentation to
> make it clear that the first bit is ignored.
>
> Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Fixes: commit 648b5cf368e0 ("mm/vmscan: remove unused RECLAIM_OFF/RECLAIM_ZONE")
> Acked-by: Ben Widawsky <ben.widawsky@...el.com>
> Cc: Alex Shi <alex.shi@...ux.alibaba.com>
> Cc: Daniel Wagner <dwagner@...e.de>
> Cc: "Tobin C. Harding" <tobin@...nel.org>
> Cc: Christoph Lameter <cl@...ux.com>
> Cc: Andrew Morton <akpm@...ux-foundation.org>
> Cc: stable@...nel.org
> ---
>
> b/Documentation/admin-guide/sysctl/vm.rst | 12 ++++++------
> b/mm/vmscan.c | 9 +++++++--
> 2 files changed, 13 insertions(+), 8 deletions(-)
>
> diff -puN mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi mm/vmscan.c
> --- a/mm/vmscan.c~mm-vmscan-restore-old-zone_reclaim_mode-abi 2020-06-25 17:32:11.559165912 -0700
> +++ b/mm/vmscan.c 2020-06-25 17:32:11.572165912 -0700
> @@ -4090,8 +4090,13 @@ module_init(kswapd_init)
> */
> int node_reclaim_mode __read_mostly;
>
> -#define RECLAIM_WRITE (1<<0) /* Writeout pages during reclaim */
> -#define RECLAIM_UNMAP (1<<1) /* Unmap pages during reclaim */
> +/*
> + * These bit locations are exposed in the vm.zone_reclaim_mode sysctl
> + * ABI. New bits are OK, but existing bits can never change.
> + */
> +#define RECLAIM_RSVD (1<<0) /* (currently ignored/unused) */
> +#define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */
> +#define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */
>
> /*
> * Priority for NODE_RECLAIM. This determines the fraction of pages
> diff -puN Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi Documentation/admin-guide/sysctl/vm.rst
> --- a/Documentation/admin-guide/sysctl/vm.rst~mm-vmscan-restore-old-zone_reclaim_mode-abi 2020-06-25 17:32:11.562165912 -0700
> +++ b/Documentation/admin-guide/sysctl/vm.rst 2020-06-25 17:32:11.572165912 -0700
> @@ -938,7 +938,7 @@ in the system.
> This is value OR'ed together of
>
> = ===================================
> -1 Zone reclaim on
> +1 (bit currently ignored)
> 2 Zone reclaim writes dirty pages out
> 4 Zone reclaim swaps pages
> = ===================================
> @@ -948,11 +948,11 @@ that benefit from having their data cach
> left disabled as the caching effect is likely to be more important than
> data locality.
>
> -zone_reclaim may be enabled if it's known that the workload is partitioned
> -such that each partition fits within a NUMA node and that accessing remote
> -memory would cause a measurable performance reduction. The page allocator
> -will then reclaim easily reusable pages (those page cache pages that are
> -currently not used) before allocating off node pages.
> +Consider enabling one or more zone_reclaim mode bits if it's known that the
> +workload is partitioned such that each partition fits within a NUMA node
> +and that accessing remote memory would cause a measurable performance
> +reduction. The page allocator will take additional actions before
> +allocating off node pages.
I think the documentation update should not be part of this patch.
This makes the back porting to stable more difficult.
Thanks,
Daniel
Powered by blists - more mailing lists