lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAOUHufb2f_EwHY5LQ59k7Nh7aS1-ZbOKtkoysb8BtxRNRFMypQ@mail.gmail.com>
Date: Fri, 5 Jul 2024 12:35:29 -0600
From: Yu Zhao <yuzhao@...gle.com>
To: James Houghton <jthoughton@...gle.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Paolo Bonzini <pbonzini@...hat.com>, 
	Ankit Agrawal <ankita@...dia.com>, Axel Rasmussen <axelrasmussen@...gle.com>, 
	Catalin Marinas <catalin.marinas@....com>, David Matlack <dmatlack@...gle.com>, 
	David Rientjes <rientjes@...gle.com>, James Morse <james.morse@....com>, 
	Jonathan Corbet <corbet@....net>, Marc Zyngier <maz@...nel.org>, Oliver Upton <oliver.upton@...ux.dev>, 
	Raghavendra Rao Ananta <rananta@...gle.com>, Ryan Roberts <ryan.roberts@....com>, 
	Sean Christopherson <seanjc@...gle.com>, Shaoqin Huang <shahuang@...hat.com>, 
	Suzuki K Poulose <suzuki.poulose@....com>, Wei Xu <weixugc@...gle.com>, 
	Will Deacon <will@...nel.org>, Zenghui Yu <yuzenghui@...wei.com>, kvmarm@...ts.linux.dev, 
	kvm@...r.kernel.org, linux-arm-kernel@...ts.infradead.org, 
	linux-doc@...r.kernel.org, linux-kernel@...r.kernel.org, linux-mm@...ck.org
Subject: Re: [PATCH v5 8/9] mm: multi-gen LRU: Have secondary MMUs participate
 in aging

On Mon, Jun 10, 2024 at 6:22 PM James Houghton <jthoughton@...gle.com> wrote:
>
> Secondary MMUs are currently consulted for access/age information at
> eviction time, but before then, we don't get accurate age information.
> That is, pages that are mostly accessed through a secondary MMU (like
> guest memory, used by KVM) will always just proceed down to the oldest
> generation, and then at eviction time, if KVM reports the page to be
> young, the page will be activated/promoted back to the youngest
> generation.
>
> The added feature bit (0x8), if disabled, will make MGLRU behave as if
> there are no secondary MMUs subscribed to MMU notifiers except at
> eviction time.
>
> Implement aging with the new mmu_notifier_test_clear_young_fast_only()
> notifier. For architectures that do not support this notifier, this
> becomes a no-op. For architectures that do implement it, it should be
> fast enough to make aging worth it.
>
> Suggested-by: Yu Zhao <yuzhao@...gle.com>
> Signed-off-by: James Houghton <jthoughton@...gle.com>
> ---
>
> Notes:
>     should_look_around() can sometimes use two notifiers now instead of one.
>
>     This simply comes from restricting myself from not changing
>     mmu_notifier_clear_young() to return more than just "young or not".
>
>     I could change mmu_notifier_clear_young() (and
>     mmu_notifier_test_young()) to return if it was fast or not. At that
>     point, I could just as well combine all the notifiers into one notifier,
>     like what was in v2 and v3.
>
>  Documentation/admin-guide/mm/multigen_lru.rst |   6 +-
>  include/linux/mmzone.h                        |   6 +-
>  mm/rmap.c                                     |   9 +-
>  mm/vmscan.c                                   | 185 ++++++++++++++----
>  4 files changed, 164 insertions(+), 42 deletions(-)

...

>  static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
>                            struct mm_walk *args)
>  {
> @@ -3357,8 +3416,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
>         struct pglist_data *pgdat = lruvec_pgdat(walk->lruvec);
>         DEFINE_MAX_SEQ(walk->lruvec);
>         int old_gen, new_gen = lru_gen_from_seq(max_seq);
> +       struct mm_struct *mm = args->mm;
>
> -       pte = pte_offset_map_nolock(args->mm, pmd, start & PMD_MASK, &ptl);
> +       pte = pte_offset_map_nolock(mm, pmd, start & PMD_MASK, &ptl);
>         if (!pte)
>                 return false;
>         if (!spin_trylock(ptl)) {
> @@ -3376,11 +3436,12 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
>                 total++;
>                 walk->mm_stats[MM_LEAF_TOTAL]++;
>
> -               pfn = get_pte_pfn(ptent, args->vma, addr);
> +               pfn = get_pte_pfn(ptent, args->vma, addr, pgdat);
>                 if (pfn == -1)
>                         continue;
>
> -               if (!pte_young(ptent)) {
> +               if (!pte_young(ptent) &&
> +                   !lru_gen_notifier_test_young(mm, addr)) {
>                         walk->mm_stats[MM_LEAF_OLD]++;
>                         continue;
>                 }
> @@ -3389,8 +3450,9 @@ static bool walk_pte_range(pmd_t *pmd, unsigned long start, unsigned long end,
>                 if (!folio)
>                         continue;
>
> -               if (!ptep_test_and_clear_young(args->vma, addr, pte + i))
> -                       VM_WARN_ON_ONCE(true);
> +               lru_gen_notifier_clear_young(mm, addr, addr + PAGE_SIZE);
> +               if (pte_young(ptent))
> +                       ptep_test_and_clear_young(args->vma, addr, pte + i);
>
>                 young++;
>                 walk->mm_stats[MM_LEAF_YOUNG]++;


There are two ways to structure the test conditions in walk_pte_range():
1. a single pass into the MMU notifier (combine test/clear) which
causes a cache miss from get_pfn_page() if the page is NOT young.
2. two passes into the MMU notifier (separate test/clear) if the page
is young, which does NOT cause a cache miss if the page is NOT young.

v2 can batch up to 64 PTEs, i.e., it only goes into the MMU notifier
twice every 64 PTEs, and therefore the second option is a clear win.

But you are doing twice per PTE. So what's the rationale behind going
with the second option? Was the first option considered?

In addition, what about the non-lockless cases? Would this change make
them worse by grabbing the MMU lock twice per PTE?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ