lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4xy=62Gtb2W4P0Fss4XZiuJvQikgO6MXtg4TB_2FX7haA@mail.gmail.com>
Date: Sat, 12 Apr 2025 15:18:16 +0800
From: Barry Song <21cnbao@...il.com>
To: David Laight <david.laight.linux@...il.com>
Cc: Xavier <xavier_qy@....com>, catalin.marinas@....com, will@...nel.org, 
	akpm@...ux-foundation.org, ryan.roberts@....com, ioworker0@...il.com, 
	linux-arm-kernel@...ts.infradead.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v1] mm/contpte: Optimize loop to reduce redundant operations

On Fri, Apr 11, 2025 at 8:03 PM David Laight
<david.laight.linux@...il.com> wrote:
>
> On Fri, 11 Apr 2025 09:25:39 +1200
> Barry Song <21cnbao@...il.com> wrote:
>
> > On Mon, Apr 7, 2025 at 9:23 PM Xavier <xavier_qy@....com> wrote:
> > >
> > > This commit optimizes the contpte_ptep_get function by adding early
> > >  termination logic. It checks if the dirty and young bits of orig_pte
> > >  are already set and skips redundant bit-setting operations during
> > >  the loop. This reduces unnecessary iterations and improves performance.
> > >
> > > Signed-off-by: Xavier <xavier_qy@....com>
> > > ---
> > >  arch/arm64/mm/contpte.c | 13 +++++++++++--
> > >  1 file changed, 11 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/arm64/mm/contpte.c b/arch/arm64/mm/contpte.c
> > > index bcac4f55f9c1..ca15d8f52d14 100644
> > > --- a/arch/arm64/mm/contpte.c
> > > +++ b/arch/arm64/mm/contpte.c
> > > @@ -163,17 +163,26 @@ pte_t contpte_ptep_get(pte_t *ptep, pte_t orig_pte)
> > >
> > >         pte_t pte;
> > >         int i;
> > > +       bool dirty = false;
> > > +       bool young = false;
> > >
> > >         ptep = contpte_align_down(ptep);
> > >
> > >         for (i = 0; i < CONT_PTES; i++, ptep++) {
> > >                 pte = __ptep_get(ptep);
> > >
> > > -               if (pte_dirty(pte))
> > > +               if (!dirty && pte_dirty(pte)) {
> > > +                       dirty = true;
> > >                         orig_pte = pte_mkdirty(orig_pte);
> > > +               }
> > >
> > > -               if (pte_young(pte))
> > > +               if (!young && pte_young(pte)) {
> > > +                       young = true;
> > >                         orig_pte = pte_mkyoung(orig_pte);
> > > +               }
> > > +
> > > +               if (dirty && young)
> > > +                       break;
> >
> > This kind of optimization is always tricky. Dev previously tried a similar
> > approach to reduce the loop count, but it ended up causing performance
> > degradation:
> > https://lore.kernel.org/linux-mm/20240913091902.1160520-1-dev.jain@arm.com/
> >
> > So we may need actual data to validate this idea.
>
> You might win with 3 loops.
> The first looks for both 'dirty' and 'young'.
> If it finds only one it jumps to a different loop that continues
> the search but only looks for the other flag.

However, there's no concrete evidence that this loop is a hot path. It might
save a few instructions when the first PTE is both young and dirty, but in
many cases, none of the PTEs are either. Previous profiling indicates that
the actual hot path lies in the rmap walk to locate the page table entries
during folio_reference in the reclamation path.

I suspect the code you're optimizing isn't actually part of a hot path at all.

>
>         David
>
> >
> > >         }
> > >
> > >         return orig_pte;
> > > --
> > > 2.34.1
> > >
> >

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ