linux-kernel - Re: [PATCH RFC 1/2] mm: collect the number of anon large folios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4z-bCSSQecYq=L4U1QuoQUCtgY1WXbAX=eCEO9rXv8eNQ@mail.gmail.com>
Date: Sun, 11 Aug 2024 17:20:37 +1200
From: Barry Song <21cnbao@...il.com>
To: David Hildenbrand <david@...hat.com>
Cc: akpm@...ux-foundation.org, baolin.wang@...ux.alibaba.com, 
	chrisl@...nel.org, hanchuanhua@...o.com, ioworker0@...il.com, 
	kaleshsingh@...gle.com, kasong@...cent.com, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, ryan.roberts@....com, v-songbaohua@...o.com, 
	ziy@...dia.com
Subject: Re: [PATCH RFC 1/2] mm: collect the number of anon large folios

On Fri, Aug 9, 2024 at 7:22 PM David Hildenbrand <david@...hat.com> wrote:
>
> On 09.08.24 09:04, Barry Song wrote:
> >>>> I would appreciate if we leave the rmap out here.
> >>>>
> >>>> Can't we handle that when actually freeing the folio? folio_test_anon()
> >>>> is sticky until freed.
> >>>
> >>> To be clearer: we increment the counter when we set a folio anon, which
> >>> should indeed only happen in folio_add_new_anon_rmap(). We'll have to
> >>> ignore hugetlb here where we do it in hugetlb_add_new_anon_rmap().
> >>>
> >>> Then, when we free an anon folio we decrement the counter. (hugetlb
> >>> should clear the anon flag when an anon folio gets freed back to its
> >>> allocator -- likely that is already done).
> >>>
> >>
> >> Sorry that I am talking to myself: I'm wondering if we also have to
> >> adjust the counter when splitting a large folio to multiple
> >> smaller-but-still-large folios.
> >
> > Hi David,
> >
> > The conceptual code is shown below. Does this make more
> > sense to you? we have a line "mod_mthp_stat(new_order,
> > MTHP_STAT_NR_ANON, 1 << (order - new_order));"
> >
> > @@ -3270,8 +3272,9 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >       struct deferred_split *ds_queue = get_deferred_split_queue(folio);
> >       /* reset xarray order to new order after split */
> >       XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
> > -     struct anon_vma *anon_vma = NULL;
> > +     bool is_anon = folio_test_anon(folio);
> >       struct address_space *mapping = NULL;
> > +     struct anon_vma *anon_vma = NULL;
> >       int order = folio_order(folio);
> >       int extra_pins, ret;
> >       pgoff_t end;
> > @@ -3283,7 +3286,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >       if (new_order >= folio_order(folio))
> >               return -EINVAL;
> >
> > -     if (folio_test_anon(folio)) {
> > +     if (is_anon) {
> >               /* order-1 is not supported for anonymous THP. */
> >               if (new_order == 1) {
> >                       VM_WARN_ONCE(1, "Cannot split to order-1 folio");
> > @@ -3323,7 +3326,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >       if (folio_test_writeback(folio))
> >               return -EBUSY;
> >
> > -     if (folio_test_anon(folio)) {
> > +     if (is_anon) {
> >               /*
> >                * The caller does not necessarily hold an mmap_lock that would
> >                * prevent the anon_vma disappearing so we first we take a
> > @@ -3437,6 +3440,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> >                       }
> >               }
> >
> > +             if (is_anon) {
> > +                     mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> > +                     mod_mthp_stat(new_order, MTHP_STAT_NR_ANON, 1 << (order - new_order));
> > +             }
> >               __split_huge_page(page, list, end, new_order);
> >               ret = 0;
> >       } else {
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 408ef3d25cf5..c869d0601614 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1039,6 +1039,7 @@ __always_inline bool free_pages_prepare(struct page *page,
> >       bool skip_kasan_poison = should_skip_kasan_poison(page);
> >       bool init = want_init_on_free();
> >       bool compound = PageCompound(page);
> > +     bool anon = PageAnon(page);
> >
> >       VM_BUG_ON_PAGE(PageTail(page), page);
> >
> > @@ -1130,6 +1131,9 @@ __always_inline bool free_pages_prepare(struct page *page,
> >
> >       debug_pagealloc_unmap_pages(page, 1 << order);
> >
> > +     if (anon && compound)
> > +             mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> > +
> >       return true;
>
> I'd have placed it here, when we are already passed the "PageMappingFlags" check and
> shouldn't have any added overhead for most !anon pages IIRC (mostly only anon/ksm pages should
> run into that path).
>
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 408ef3d25cf5..a11b9dd62964 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1079,8 +1079,11 @@ __always_inline bool free_pages_prepare(struct page *page,
>                          (page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
>                  }
>          }
> -       if (PageMappingFlags(page))
> +       if (PageMappingFlags(page)) {
> +               if (PageAnon(page) && compound)
> +                       mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
>                  page->mapping = NULL;
> +       }
>          if (is_check_pages_enabled()) {
>                  if (free_page_is_bad(page))
>                          bad++;
>

This looks better, but we're not concerned about the bad pages, correct?
For bad pages, we're reducing the count by 1, but they aren't actually
freed. We don't need to worry about this since it's already considered
a bug, right?

> Conceptually LGTM. We account an anon folio as long as it's anon,
> even when still GUP-pinned after unmapping it or when temporarily
> unmapping+remapping it during migration.

right. but migration might be a problem? as the dst folio doesn't
call folio_add_new_anon_rmap(), dst->mapping is copied from
src. So I suspect we need the below (otherwise, src has been put
and got -1, but dst won't get +1),

diff --git a/mm/migrate.c b/mm/migrate.c
index 7e1267042a56..11ef11e59036 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1102,6 +1102,8 @@ static void migrate_folio_done(struct folio *src,
                mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
                                    folio_is_file_lru(src),
-folio_nr_pages(src));

+       mod_mthp_stat(folio_order(src), MTHP_STAT_NR_ANON, 1);
+
        if (reason != MR_MEMORY_FAILURE)
                /* We release the page in page_handle_poison. */
                folio_put(src);


>
> --
> Cheers,
>
> David / dhildenb
>

Thanks
Barry