[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkpbua9BM+sMng3d3Fxo+61HVNEegviTLqtVVtxtjjbsCQ@mail.gmail.com>
Date: Thu, 1 Apr 2021 13:06:27 -0700
From: Yang Shi <shy828301@...il.com>
To: Dave Hansen <dave.hansen@...ux.intel.com>
Cc: Linux MM <linux-mm@...ck.org>,
Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
weixugc@...gle.com, Yang Shi <yang.shi@...ux.alibaba.com>,
David Rientjes <rientjes@...gle.com>,
Huang Ying <ying.huang@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
David Hildenbrand <david@...hat.com>,
Oscar Salvador <osalvador@...e.de>
Subject: Re: [PATCH 10/10] mm/migrate: new zone_reclaim_mode to enable reclaim migration
On Thu, Apr 1, 2021 at 11:35 AM Dave Hansen <dave.hansen@...ux.intel.com> wrote:
>
>
> From: Dave Hansen <dave.hansen@...ux.intel.com>
>
> Some method is obviously needed to enable reclaim-based migration.
>
> Just like traditional autonuma, there will be some workloads that
> will benefit like workloads with more "static" configurations where
> hot pages stay hot and cold pages stay cold. If pages come and go
> from the hot and cold sets, the benefits of this approach will be
> more limited.
>
> The benefits are truly workload-based and *not* hardware-based.
> We do not believe that there is a viable threshold where certain
> hardware configurations should have this mechanism enabled while
> others do not.
>
> To be conservative, earlier work defaulted to disable reclaim-
> based migration and did not include a mechanism to enable it.
> This proposes extending the existing "zone_reclaim_mode" (now
> now really node_reclaim_mode) as a method to enable it.
>
> We are open to any alternative that allows end users to enable
> this mechanism or disable it it workload harm is detected (just
> like traditional autonuma).
>
> Once this is enabled page demotion may move data to a NUMA node
> that does not fall into the cpuset of the allocating process.
> This could be construed to violate the guarantees of cpusets.
> However, since this is an opt-in mechanism, the assumption is
> that anyone enabling it is content to relax the guarantees.
>
> Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Cc: Wei Xu <weixugc@...gle.com>
> Cc: Yang Shi <yang.shi@...ux.alibaba.com>
> Cc: David Rientjes <rientjes@...gle.com>
> Cc: Huang Ying <ying.huang@...el.com>
> Cc: Dan Williams <dan.j.williams@...el.com>
> Cc: David Hildenbrand <david@...hat.com>
> Cc: osalvador <osalvador@...e.de>
>
> Changes since 20200122:
> * Changelog material about relaxing cpuset constraints
>
> Changes since 20210304:
> * Add Documentation/ material about relaxing cpuset constraints
Reviewed-by: Yang Shi <shy828301@...il.com>
> ---
>
> b/Documentation/admin-guide/sysctl/vm.rst | 12 ++++++++++++
> b/include/linux/swap.h | 3 ++-
> b/include/uapi/linux/mempolicy.h | 1 +
> b/mm/vmscan.c | 6 ++++--
> 4 files changed, 19 insertions(+), 3 deletions(-)
>
> diff -puN Documentation/admin-guide/sysctl/vm.rst~RECLAIM_MIGRATE Documentation/admin-guide/sysctl/vm.rst
> --- a/Documentation/admin-guide/sysctl/vm.rst~RECLAIM_MIGRATE 2021-03-31 15:17:40.324000190 -0700
> +++ b/Documentation/admin-guide/sysctl/vm.rst 2021-03-31 15:17:40.349000190 -0700
> @@ -976,6 +976,7 @@ This is value OR'ed together of
> 1 Zone reclaim on
> 2 Zone reclaim writes dirty pages out
> 4 Zone reclaim swaps pages
> +8 Zone reclaim migrates pages
> = ===================================
>
> zone_reclaim_mode is disabled by default. For file servers or workloads
> @@ -1000,3 +1001,14 @@ of other processes running on other node
> Allowing regular swap effectively restricts allocations to the local
> node unless explicitly overridden by memory policies or cpuset
> configurations.
> +
> +Page migration during reclaim is intended for systems with tiered memory
> +configurations. These systems have multiple types of memory with varied
> +performance characteristics instead of plain NUMA systems where the same
> +kind of memory is found at varied distances. Allowing page migration
> +during reclaim enables these systems to migrate pages from fast tiers to
> +slow tiers when the fast tier is under pressure. This migration is
> +performed before swap. It may move data to a NUMA node that does not
> +fall into the cpuset of the allocating process which might be construed
> +to violate the guarantees of cpusets. This should not be enabled on
> +systems which need strict cpuset location guarantees.
> diff -puN include/linux/swap.h~RECLAIM_MIGRATE include/linux/swap.h
> --- a/include/linux/swap.h~RECLAIM_MIGRATE 2021-03-31 15:17:40.331000190 -0700
> +++ b/include/linux/swap.h 2021-03-31 15:17:40.351000190 -0700
> @@ -382,7 +382,8 @@ extern int sysctl_min_slab_ratio;
> static inline bool node_reclaim_enabled(void)
> {
> /* Is any node_reclaim_mode bit set? */
> - return node_reclaim_mode & (RECLAIM_ZONE|RECLAIM_WRITE|RECLAIM_UNMAP);
> + return node_reclaim_mode & (RECLAIM_ZONE |RECLAIM_WRITE|
> + RECLAIM_UNMAP|RECLAIM_MIGRATE);
> }
>
> extern void check_move_unevictable_pages(struct pagevec *pvec);
> diff -puN include/uapi/linux/mempolicy.h~RECLAIM_MIGRATE include/uapi/linux/mempolicy.h
> --- a/include/uapi/linux/mempolicy.h~RECLAIM_MIGRATE 2021-03-31 15:17:40.337000190 -0700
> +++ b/include/uapi/linux/mempolicy.h 2021-03-31 15:17:40.352000190 -0700
> @@ -71,5 +71,6 @@ enum {
> #define RECLAIM_ZONE (1<<0) /* Run shrink_inactive_list on the zone */
> #define RECLAIM_WRITE (1<<1) /* Writeout pages during reclaim */
> #define RECLAIM_UNMAP (1<<2) /* Unmap pages during reclaim */
> +#define RECLAIM_MIGRATE (1<<3) /* Migrate to other nodes during reclaim */
>
> #endif /* _UAPI_LINUX_MEMPOLICY_H */
> diff -puN mm/vmscan.c~RECLAIM_MIGRATE mm/vmscan.c
> --- a/mm/vmscan.c~RECLAIM_MIGRATE 2021-03-31 15:17:40.339000190 -0700
> +++ b/mm/vmscan.c 2021-03-31 15:17:40.357000190 -0700
> @@ -1074,6 +1074,9 @@ static bool migrate_demote_page_ok(struc
> VM_BUG_ON_PAGE(PageHuge(page), page);
> VM_BUG_ON_PAGE(PageLRU(page), page);
>
> + if (!(node_reclaim_mode & RECLAIM_MIGRATE))
> + return false;
> +
> /* It is pointless to do demotion in memcg reclaim */
> if (cgroup_reclaim(sc))
> return false;
> @@ -1083,8 +1086,7 @@ static bool migrate_demote_page_ok(struc
> if (PageTransHuge(page) && !thp_migration_supported())
> return false;
>
> - // FIXME: actually enable this later in the series
> - return false;
> + return true;
> }
>
> /* Check if a page is dirty or under writeback */
> _
>
Powered by blists - more mailing lists