lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <adf7603e-418b-4f84-b49a-9e784fa7efb3@lucifer.local>
Date: Thu, 7 Aug 2025 21:11:12 +0100
From: Lorenzo Stoakes <lorenzo.stoakes@...cle.com>
To: David Hildenbrand <david@...hat.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>,
        "Liam R . Howlett" <Liam.Howlett@...cle.com>,
        Vlastimil Babka <vbabka@...e.cz>, Jann Horn <jannh@...gle.com>,
        Pedro Falcato <pfalcato@...e.de>, Barry Song <baohua@...nel.org>,
        Dev Jain <dev.jain@....com>, linux-mm@...ck.org,
        linux-kernel@...r.kernel.org, Ryan Roberts <ryan.roberts@....com>
Subject: Re: [PATCH HOTFIX 6.17] mm/mremap: avoid expensive folio lookup on
 mremap folio pte batch

On Thu, Aug 07, 2025 at 09:41:24PM +0200, David Hildenbrand wrote:
> On 07.08.25 21:20, Lorenzo Stoakes wrote:
> > +cc Ryan for ContPTE stuff.
> >
> > On Thu, Aug 07, 2025 at 09:10:52PM +0200, David Hildenbrand wrote:
> > > Acked-by: David Hildenbrand <david@...hat.com>
> >
> > Thanks!
> >
> > >
> > > Wondering whether we could then just use the patch hint instead of going via
> > > the folio.
> > >
> > > IOW,
> > >
> > > return pte_batch_hint(ptep, pte);
> >
> > Wouldn't that break the A/D stuff? Also this doesn't mean that the PTE won't
> > have some conflicting flags potentially. The check is empirical:
> >
> > static inline unsigned int pte_batch_hint(pte_t *ptep, pte_t pte)
> > {
> > 	if (!pte_valid_cont(pte))
> > 		return 1;
> >
> > 	return CONT_PTES - (((unsigned long)ptep >> 3) & (CONT_PTES - 1));
> > }
> >
> > So it's 'the most number of PTEs that _might_ coalesce'.
>
> No. If the bit is set, all PTEs in the aligned range (e.g., 64 KiB block)
> are coalesced. It's literally the bit telling the hardware that it can
> coalesce in that range because all PTEs are alike.

Sigh. So this is just yet another a horribly named function then. I was pretty
certain somebody explained it to me this way, but it's another reminder to never
trust anything you're told and to check everything...

My understanding of the word 'hint' does not align with what this function
does... perhaps there's some deeper meaning I'm missing...?

>
> The function tells you exactly how many PTEs you can batch from the given
> PTEP in that 64 KiB block.
>
> That's also why folio_pte_batch_flags() just jumps over that.

It would still be the case if it were the maximum it _could_ be if you could
ascertain if it _was_. But of course we don't, indeed.

>
> All you have to do is limit it by the maximum number.
>
> So likely you would have to do here
>
> diff --git a/mm/mremap.c b/mm/mremap.c
> index 677a4d744df9c..58f9cf52eb6bd 100644
> --- a/mm/mremap.c
> +++ b/mm/mremap.c
> @@ -174,16 +174,7 @@ static pte_t move_soft_dirty_pte(pte_t pte)
>  static int mremap_folio_pte_batch(struct vm_area_struct *vma, unsigned long
> addr,
>                 pte_t *ptep, pte_t pte, int max_nr)
>  {
> -       struct folio *folio;
> -
> -       if (max_nr == 1)
> -               return 1;
> -
> -       folio = vm_normal_folio(vma, addr, pte);
> -       if (!folio || !folio_test_large(folio))
> -               return 1;
> -
> -       return folio_pte_batch(folio, ptep, pte, max_nr);
> +       return min_t(unsigned int, max_nr, pte_batch_hint(ptep, pte));
>  }

Right except you're ignoring A/D bits no? Or what's the point of
folio_pte_batch()?

Why don't they matter here? I thought they did?

>
>  static int move_ptes(struct pagetable_move_control *pmc,
>
>
> And make sure that the compiler realizes that max_nr >= 1 and optimized away
> the min_t on !arm64.
>
> >
> > (note that a bit grossly we'll call it _again_ in folio_pte_batch_flags()).
> >
> > I suppose we could not even bother with checking if same folio and _just_
> check> if PTEs have consecutive PFNs, which is not very likely if different
> folio
> > but... could that break something?
> >
> > It seems the 'magic' is in set_ptes() on arm64 where it'll know to do the 'right
> > thing' for a contPTE batch (I may be missing something - please correct me if so
> > Dev/Ryan).
> >
> > So actually do we even really care that much about folio?
>
> I don't think so. Not in this case here where we don't use the folio for
> anything else.

I mean my suggestion is that we don't actually then really need the folio
at all, it's very unlkely we'll get contiguous PFNs. So we could have a
version of folio_pte_batch_flags() that doesn't need the folio...

Anyway strikes me all this should be stuff we look at after the hotfix,
better to get this landed so the regression is resolved.

>
>
> --
> Cheers,
>
> David / dhildenb
>

Cheers, Lorenzo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ