linux-kernel - Re: [PATCH RFC 1/2] mm: collect the number of anon large folios

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAGsJ_4w0SzyA50zNcGAgUc36GOEVr3L6gcZntw4ejBogC9b6+Q@mail.gmail.com>
Date: Sun, 11 Aug 2024 18:54:33 +1200
From: Barry Song <21cnbao@...il.com>
To: David Hildenbrand <david@...hat.com>
Cc: akpm@...ux-foundation.org, baolin.wang@...ux.alibaba.com, 
	chrisl@...nel.org, hanchuanhua@...o.com, ioworker0@...il.com, 
	kaleshsingh@...gle.com, kasong@...cent.com, linux-kernel@...r.kernel.org, 
	linux-mm@...ck.org, ryan.roberts@....com, v-songbaohua@...o.com, 
	ziy@...dia.com
Subject: Re: [PATCH RFC 1/2] mm: collect the number of anon large folios

On Sun, Aug 11, 2024 at 5:20 PM Barry Song <21cnbao@...il.com> wrote:
>
> On Fri, Aug 9, 2024 at 7:22 PM David Hildenbrand <david@...hat.com> wrote:
> >
> > On 09.08.24 09:04, Barry Song wrote:
> > >>>> I would appreciate if we leave the rmap out here.
> > >>>>
> > >>>> Can't we handle that when actually freeing the folio? folio_test_anon()
> > >>>> is sticky until freed.
> > >>>
> > >>> To be clearer: we increment the counter when we set a folio anon, which
> > >>> should indeed only happen in folio_add_new_anon_rmap(). We'll have to
> > >>> ignore hugetlb here where we do it in hugetlb_add_new_anon_rmap().
> > >>>
> > >>> Then, when we free an anon folio we decrement the counter. (hugetlb
> > >>> should clear the anon flag when an anon folio gets freed back to its
> > >>> allocator -- likely that is already done).
> > >>>
> > >>
> > >> Sorry that I am talking to myself: I'm wondering if we also have to
> > >> adjust the counter when splitting a large folio to multiple
> > >> smaller-but-still-large folios.
> > >
> > > Hi David,
> > >
> > > The conceptual code is shown below. Does this make more
> > > sense to you? we have a line "mod_mthp_stat(new_order,
> > > MTHP_STAT_NR_ANON, 1 << (order - new_order));"
> > >
> > > @@ -3270,8 +3272,9 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> > >       struct deferred_split *ds_queue = get_deferred_split_queue(folio);
> > >       /* reset xarray order to new order after split */
> > >       XA_STATE_ORDER(xas, &folio->mapping->i_pages, folio->index, new_order);
> > > -     struct anon_vma *anon_vma = NULL;
> > > +     bool is_anon = folio_test_anon(folio);
> > >       struct address_space *mapping = NULL;
> > > +     struct anon_vma *anon_vma = NULL;
> > >       int order = folio_order(folio);
> > >       int extra_pins, ret;
> > >       pgoff_t end;
> > > @@ -3283,7 +3286,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> > >       if (new_order >= folio_order(folio))
> > >               return -EINVAL;
> > >
> > > -     if (folio_test_anon(folio)) {
> > > +     if (is_anon) {
> > >               /* order-1 is not supported for anonymous THP. */
> > >               if (new_order == 1) {
> > >                       VM_WARN_ONCE(1, "Cannot split to order-1 folio");
> > > @@ -3323,7 +3326,7 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> > >       if (folio_test_writeback(folio))
> > >               return -EBUSY;
> > >
> > > -     if (folio_test_anon(folio)) {
> > > +     if (is_anon) {
> > >               /*
> > >                * The caller does not necessarily hold an mmap_lock that would
> > >                * prevent the anon_vma disappearing so we first we take a
> > > @@ -3437,6 +3440,10 @@ int split_huge_page_to_list_to_order(struct page *page, struct list_head *list,
> > >                       }
> > >               }
> > >
> > > +             if (is_anon) {
> > > +                     mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> > > +                     mod_mthp_stat(new_order, MTHP_STAT_NR_ANON, 1 << (order - new_order));
> > > +             }
> > >               __split_huge_page(page, list, end, new_order);
> > >               ret = 0;
> > >       } else {
> > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > index 408ef3d25cf5..c869d0601614 100644
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1039,6 +1039,7 @@ __always_inline bool free_pages_prepare(struct page *page,
> > >       bool skip_kasan_poison = should_skip_kasan_poison(page);
> > >       bool init = want_init_on_free();
> > >       bool compound = PageCompound(page);
> > > +     bool anon = PageAnon(page);
> > >
> > >       VM_BUG_ON_PAGE(PageTail(page), page);
> > >
> > > @@ -1130,6 +1131,9 @@ __always_inline bool free_pages_prepare(struct page *page,
> > >
> > >       debug_pagealloc_unmap_pages(page, 1 << order);
> > >
> > > +     if (anon && compound)
> > > +             mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> > > +
> > >       return true;
> >
> > I'd have placed it here, when we are already passed the "PageMappingFlags" check and
> > shouldn't have any added overhead for most !anon pages IIRC (mostly only anon/ksm pages should
> > run into that path).
> >
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 408ef3d25cf5..a11b9dd62964 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -1079,8 +1079,11 @@ __always_inline bool free_pages_prepare(struct page *page,
> >                          (page + i)->flags &= ~PAGE_FLAGS_CHECK_AT_PREP;
> >                  }
> >          }
> > -       if (PageMappingFlags(page))
> > +       if (PageMappingFlags(page)) {
> > +               if (PageAnon(page) && compound)
> > +                       mod_mthp_stat(order, MTHP_STAT_NR_ANON, -1);
> >                  page->mapping = NULL;
> > +       }
> >          if (is_check_pages_enabled()) {
> >                  if (free_page_is_bad(page))
> >                          bad++;
> >
>
> This looks better, but we're not concerned about the bad pages, correct?
> For bad pages, we're reducing the count by 1, but they aren't actually
> freed. We don't need to worry about this since it's already considered
> a bug, right?
>
> > Conceptually LGTM. We account an anon folio as long as it's anon,
> > even when still GUP-pinned after unmapping it or when temporarily
> > unmapping+remapping it during migration.
>
> right. but migration might be a problem? as the dst folio doesn't
> call folio_add_new_anon_rmap(), dst->mapping is copied from
> src. So I suspect we need the below (otherwise, src has been put
> and got -1, but dst won't get +1),
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 7e1267042a56..11ef11e59036 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1102,6 +1102,8 @@ static void migrate_folio_done(struct folio *src,
>                 mod_node_page_state(folio_pgdat(src), NR_ISOLATED_ANON +
>                                     folio_is_file_lru(src),
> -folio_nr_pages(src));
>
> +       mod_mthp_stat(folio_order(src), MTHP_STAT_NR_ANON, 1);
> +
>         if (reason != MR_MEMORY_FAILURE)
>                 /* We release the page in page_handle_poison. */
>                 folio_put(src);
>

sorry. wrong place as migrate_folio_done() doesn't necessarily mean
migration is done. it could be "Folio was freed from under us"

        if (folio_ref_count(src) == 1) {
                /* Folio was freed from under us. So we are done. */
                folio_clear_active(src);
                folio_clear_unevictable(src);
                /* free_pages_prepare() will clear PG_isolated. */
                list_del(&src->lru);
                migrate_folio_done(src, reason);
                return MIGRATEPAGE_SUCCESS;
        }


the correct place should be:

@@ -1329,6 +1326,10 @@ static int migrate_folio_move(free_folio_t
put_new_folio, unsigned long private,
        if (anon_vma)
                put_anon_vma(anon_vma);
        folio_unlock(src);
+
+       if (folio_test_anon(src))
+               mod_mthp_stat(folio_order(src), MTHP_STAT_NR_ANON, 1);
+
        migrate_folio_done(src, reason);

        return rc;

Without this modification in migration code, my tests fail, anon_num can
become negative.

>
> >
> > --
> > Cheers,
> >
> > David / dhildenb
> >

Thanks
Barry