lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAHbLzkp_2UD50Vt8f_atxKcz4x8J3GB3YzTqMOd6Src_y2Yg2g@mail.gmail.com>
Date:   Thu, 17 Oct 2019 10:30:00 -0700
From:   Yang Shi <shy828301@...il.com>
To:     Dave Hansen <dave.hansen@...ux.intel.com>
Cc:     Linux Kernel Mailing List <linux-kernel@...r.kernel.org>,
        Linux MM <linux-mm@...ck.org>,
        Dan Williams <dan.j.williams@...el.com>,
        Keith Busch <keith.busch@...el.com>
Subject: Re: [PATCH 3/4] mm/vmscan: Attempt to migrate page in lieu of discard

On Wed, Oct 16, 2019 at 3:14 PM Dave Hansen <dave.hansen@...ux.intel.com> wrote:
>
>
> From: Keith Busch <keith.busch@...el.com>
>
> If a memory node has a preferred migration path to demote cold pages,
> attempt to move those inactive pages to that migration node before
> reclaiming. This will better utilize available memory, provide a faster
> tier than swapping or discarding, and allow such pages to be reused
> immediately without IO to retrieve the data.
>
> Much like swap, this is an opt-in feature that requires user defining
> where to send pages when reclaiming them. When handling anonymous pages,
> this will be considered before swap if enabled. Should the demotion fail
> for any reason, the page reclaim will proceed as if the demotion feature
> was not enabled.
>
> Some places we would like to see this used:
>
>   1. Persistent memory being as a slower, cheaper DRAM replacement
>   2. Remote memory-only "expansion" NUMA nodes
>   3. Resolving memory imbalances where one NUMA node is seeing more
>      allocation activity than another.  This helps keep more recent
>      allocations closer to the CPUs on the node doing the allocating.
>
> Signed-off-by: Keith Busch <keith.busch@...el.com>
> Co-developed-by: Dave Hansen <dave.hansen@...ux.intel.com>
> Signed-off-by: Dave Hansen <dave.hansen@...ux.intel.com>
> ---
>
>  b/include/linux/migrate.h        |    6 ++++
>  b/include/trace/events/migrate.h |    3 +-
>  b/mm/debug.c                     |    1
>  b/mm/migrate.c                   |   51 +++++++++++++++++++++++++++++++++++++++
>  b/mm/vmscan.c                    |   27 ++++++++++++++++++++
>  5 files changed, 87 insertions(+), 1 deletion(-)
>
> diff -puN include/linux/migrate.h~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard include/linux/migrate.h
> --- a/include/linux/migrate.h~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard 2019-10-16 15:06:58.090952593 -0700
> +++ b/include/linux/migrate.h   2019-10-16 15:06:58.103952593 -0700
> @@ -25,6 +25,7 @@ enum migrate_reason {
>         MR_MEMPOLICY_MBIND,
>         MR_NUMA_MISPLACED,
>         MR_CONTIG_RANGE,
> +       MR_DEMOTION,
>         MR_TYPES
>  };
>
> @@ -79,6 +80,7 @@ extern int migrate_huge_page_move_mappin
>  extern int migrate_page_move_mapping(struct address_space *mapping,
>                 struct page *newpage, struct page *page, enum migrate_mode mode,
>                 int extra_count);
> +extern int migrate_demote_mapping(struct page *page);
>  #else
>
>  static inline void putback_movable_pages(struct list_head *l) {}
> @@ -105,6 +107,10 @@ static inline int migrate_huge_page_move
>         return -ENOSYS;
>  }
>
> +static inline int migrate_demote_mapping(struct page *page)
> +{
> +       return -ENOSYS;
> +}
>  #endif /* CONFIG_MIGRATION */
>
>  #ifdef CONFIG_COMPACTION
> diff -puN include/trace/events/migrate.h~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard include/trace/events/migrate.h
> --- a/include/trace/events/migrate.h~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard  2019-10-16 15:06:58.092952593 -0700
> +++ b/include/trace/events/migrate.h    2019-10-16 15:06:58.103952593 -0700
> @@ -20,7 +20,8 @@
>         EM( MR_SYSCALL,         "syscall_or_cpuset")            \
>         EM( MR_MEMPOLICY_MBIND, "mempolicy_mbind")              \
>         EM( MR_NUMA_MISPLACED,  "numa_misplaced")               \
> -       EMe(MR_CONTIG_RANGE,    "contig_range")
> +       EM( MR_CONTIG_RANGE,    "contig_range")                 \
> +       EMe(MR_DEMOTION,        "demotion")
>
>  /*
>   * First define the enums in the above macros to be exported to userspace
> diff -puN mm/debug.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard mm/debug.c
> --- a/mm/debug.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard      2019-10-16 15:06:58.094952593 -0700
> +++ b/mm/debug.c        2019-10-16 15:06:58.103952593 -0700
> @@ -25,6 +25,7 @@ const char *migrate_reason_names[MR_TYPE
>         "mempolicy_mbind",
>         "numa_misplaced",
>         "cma",
> +       "demotion",
>  };
>
>  const struct trace_print_flags pageflag_names[] = {
> diff -puN mm/migrate.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard mm/migrate.c
> --- a/mm/migrate.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard    2019-10-16 15:06:58.097952593 -0700
> +++ b/mm/migrate.c      2019-10-16 15:06:58.104952593 -0700
> @@ -1119,6 +1119,57 @@ out:
>         return rc;
>  }
>
> +static struct page *alloc_demote_node_page(struct page *page, unsigned long node)
> +{
> +       /*
> +        * The flags are set to allocate only on the desired node in the
> +        * migration path, and to fail fast if not immediately available. We
> +        * are already doing memory reclaim, we don't want heroic efforts to
> +        * get a page.
> +        */
> +       gfp_t mask = GFP_NOWAIT | __GFP_NOWARN | __GFP_NORETRY |
> +                       __GFP_NOMEMALLOC | __GFP_THISNODE | __GFP_MOVABLE;
> +       struct page *newpage;
> +
> +       if (PageTransHuge(page)) {
> +               mask |= __GFP_COMP;
> +               newpage = alloc_pages_node(node, mask, HPAGE_PMD_ORDER);
> +               if (newpage)
> +                       prep_transhuge_page(newpage);
> +       } else
> +               newpage = alloc_pages_node(node, mask, 0);
> +
> +       return newpage;
> +}
> +
> +/**
> + * migrate_demote_mapping() - Migrate this page and its mappings to its
> + *                            demotion node.
> + * @page: A locked, isolated, non-huge page that should migrate to its current
> + *        node's demotion target, if available. Since this is intended to be
> + *        called during memory reclaim, all flag options are set to fail fast.
> + *
> + * @returns: MIGRATEPAGE_SUCCESS if successful, -errno otherwise.
> + */
> +int migrate_demote_mapping(struct page *page)
> +{
> +       int next_nid = next_migration_node(page_to_nid(page));
> +
> +       VM_BUG_ON_PAGE(!PageLocked(page), page);
> +       VM_BUG_ON_PAGE(PageHuge(page), page);
> +       VM_BUG_ON_PAGE(PageLRU(page), page);
> +
> +       if (next_nid < 0)
> +               return -ENOSYS;
> +       if (PageTransHuge(page) && !thp_migration_supported())
> +               return -ENOMEM;
> +
> +       /* MIGRATE_ASYNC is the most light weight and never blocks.*/
> +       return __unmap_and_move(alloc_demote_node_page, NULL, next_nid,
> +                               page, MIGRATE_ASYNC, MR_DEMOTION);
> +}
> +
> +
>  /*
>   * gcc 4.7 and 4.8 on arm get an ICEs when inlining unmap_and_move().  Work
>   * around it.
> diff -puN mm/vmscan.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard mm/vmscan.c
> --- a/mm/vmscan.c~0005-mm-vmscan-Attempt-to-migrate-page-in-lieu-of-discard     2019-10-16 15:06:58.099952593 -0700
> +++ b/mm/vmscan.c       2019-10-16 15:06:58.105952593 -0700
> @@ -1262,6 +1262,33 @@ static unsigned long shrink_page_list(st
>                         ; /* try to reclaim the page below */
>                 }
>
> +               if (!PageHuge(page)) {
> +                       int rc = migrate_demote_mapping(page);
> +
> +                       /*
> +                        * -ENOMEM on a THP may indicate either migration is
> +                        * unsupported or there was not enough contiguous
> +                        * space. Split the THP into base pages and retry the
> +                        * head immediately. The tail pages will be considered
> +                        * individually within the current loop's page list.
> +                        */
> +                       if (rc == -ENOMEM && PageTransHuge(page) &&
> +                           !split_huge_page_to_list(page, page_list))
> +                               rc = migrate_demote_mapping(page);

I recalled when Keith posted the patch at the first time, I raised
question about why not just migrating THP in a whole? The
migrate_pages() could handle this. If it fails, it just fallbacks to
base page.

Since the most optimistic gfp flags are used, it should not trap into
nested direct reclaim. The migrate_pages() should just return failure
then fallback to base page.

> +
> +                       if (rc == MIGRATEPAGE_SUCCESS) {
> +                               unlock_page(page);
> +                               if (likely(put_page_testzero(page)))
> +                                       goto free_it;
> +                               /*
> +                                * Speculative reference will free this page,
> +                                * so leave it off the LRU.
> +                                */
> +                               nr_reclaimed++;
> +                               continue;
> +                       }
> +               }
> +
>                 /*
>                  * Anonymous process memory has backing store?
>                  * Try to allocate it some swap space here.
> _
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ