lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4y07BjVQRG01jEVg3Guc8rxSWFhSO7fzSAZ8XD0YWusLQ@mail.gmail.com>
Date: Fri, 12 Apr 2024 11:01:41 +1200
From: Barry Song <21cnbao@...il.com>
To: Ryan Roberts <ryan.roberts@....com>
Cc: akpm@...ux-foundation.org, linux-mm@...ck.org, 
	baolin.wang@...ux.alibaba.com, chrisl@...nel.org, david@...hat.com, 
	hanchuanhua@...o.com, hannes@...xchg.org, hughd@...gle.com, 
	kasong@...cent.com, surenb@...gle.com, v-songbaohua@...o.com, 
	willy@...radead.org, xiang@...nel.org, ying.huang@...el.com, 
	yosryahmed@...gle.com, yuzhao@...gle.com, ziy@...dia.com, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 5/5] mm: add per-order mTHP swpin_refault counter

On Fri, Apr 12, 2024 at 3:53 AM Ryan Roberts <ryan.roberts@....com> wrote:
>
> On 09/04/2024 09:26, Barry Song wrote:
> > From: Barry Song <v-songbaohua@...o.com>
> >
> > Currently, we are handling the scenario where we've hit a
> > large folio in the swapcache, and the reclaiming process
> > for this large folio is still ongoing.
> >
> > Signed-off-by: Barry Song <v-songbaohua@...o.com>
> > ---
> >  include/linux/huge_mm.h | 1 +
> >  mm/huge_memory.c        | 2 ++
> >  mm/memory.c             | 1 +
> >  3 files changed, 4 insertions(+)
> >
> > diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
> > index c8256af83e33..b67294d5814f 100644
> > --- a/include/linux/huge_mm.h
> > +++ b/include/linux/huge_mm.h
> > @@ -269,6 +269,7 @@ enum mthp_stat_item {
> >       MTHP_STAT_ANON_ALLOC_FALLBACK,
> >       MTHP_STAT_ANON_SWPOUT,
> >       MTHP_STAT_ANON_SWPOUT_FALLBACK,
> > +     MTHP_STAT_ANON_SWPIN_REFAULT,
>
> I don't see any equivalent counter for small folios. Is there an analogue?

Indeed, we don't count refaults for small folios, as their refault
mechanism is much
simpler compared to large folios. Implementing this counter can enhance the
system's visibility to users.

Personally, having this counter and observing a non-zero value greatly enhances
my confidence when debugging this refault series. Otherwise, it feels like being
blind to what's happening inside the system :-)

>
> >       __MTHP_STAT_COUNT
> >  };
> >
> > diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> > index d8d2ed80b0bf..fb95345b0bde 100644
> > --- a/mm/huge_memory.c
> > +++ b/mm/huge_memory.c
> > @@ -556,12 +556,14 @@ DEFINE_MTHP_STAT_ATTR(anon_alloc, MTHP_STAT_ANON_ALLOC);
> >  DEFINE_MTHP_STAT_ATTR(anon_alloc_fallback, MTHP_STAT_ANON_ALLOC_FALLBACK);
> >  DEFINE_MTHP_STAT_ATTR(anon_swpout, MTHP_STAT_ANON_SWPOUT);
> >  DEFINE_MTHP_STAT_ATTR(anon_swpout_fallback, MTHP_STAT_ANON_SWPOUT_FALLBACK);
> > +DEFINE_MTHP_STAT_ATTR(anon_swpin_refault, MTHP_STAT_ANON_SWPIN_REFAULT);
> >
> >  static struct attribute *stats_attrs[] = {
> >       &anon_alloc_attr.attr,
> >       &anon_alloc_fallback_attr.attr,
> >       &anon_swpout_attr.attr,
> >       &anon_swpout_fallback_attr.attr,
> > +     &anon_swpin_refault_attr.attr,
> >       NULL,
> >  };
> >
> > diff --git a/mm/memory.c b/mm/memory.c
> > index 9818dc1893c8..acc023795a4d 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -4167,6 +4167,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf)
> >               nr_pages = nr;
> >               entry = folio->swap;
> >               page = &folio->page;
> > +             count_mthp_stat(folio_order(folio), MTHP_STAT_ANON_SWPIN_REFAULT);
>
> I don't think this is the point of no return yet? There's the pte_same() check
> immediately below (although I've suggested that needs to be moved to earlier),
> but also the folio_test_uptodate() check. Perhaps this should go after that?
>

swap_pte_batch() == nr_pages should have passed the test for pte_same.
folio_test_uptodate(folio)) should be also unlikely to be true as we are
not reading from swap devices for refault case.

but i agree we can move all the refault handling after those two "goto
out_nomap".

> >       }
> >
> >  check_pte:
>

Thanks
Barry

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ