lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAKEwX=MqV5CThRxTXs3DKqGNw04w2j=4hmE+Wi7x4Gu_ykATmw@mail.gmail.com>
Date:   Tue, 3 Oct 2023 16:26:10 -0700
From:   Nhat Pham <nphamcs@...il.com>
To:     Mike Kravetz <mike.kravetz@...cle.com>
Cc:     Johannes Weiner <hannes@...xchg.org>, akpm@...ux-foundation.org,
        riel@...riel.com, mhocko@...nel.org, roman.gushchin@...ux.dev,
        shakeelb@...gle.com, muchun.song@...ux.dev, tj@...nel.org,
        lizefan.x@...edance.com, shuah@...nel.org, yosryahmed@...gle.com,
        fvdl@...gle.com, linux-mm@...ck.org, kernel-team@...a.com,
        linux-kernel@...r.kernel.org, cgroups@...r.kernel.org
Subject: Re: [PATCH v3 2/3] hugetlb: memcg: account hugetlb-backed memory in
 memory controller

On Tue, Oct 3, 2023 at 3:42 PM Mike Kravetz <mike.kravetz@...cle.com> wrote:
>
> On 10/03/23 15:09, Nhat Pham wrote:
> > On Tue, Oct 3, 2023 at 11:39 AM Johannes Weiner <hannes@...xchg.org> wrote:
> > > On Tue, Oct 03, 2023 at 11:01:24AM -0700, Nhat Pham wrote:
> > > > On Tue, Oct 3, 2023 at 10:13 AM Mike Kravetz <mike.kravetz@...cle.com> wrote:
> > > > > On 10/02/23 17:18, Nhat Pham wrote:
> > > > >
> > > > > IIUC, huge page usage is charged in alloc_hugetlb_folio and uncharged in
> > > > > free_huge_folio.  During migration, huge pages are allocated via
> > > > > alloc_migrate_hugetlb_folio, not alloc_hugetlb_folio.  So, there is no
> > > > > charging for the migration target page and we uncharge the source page.
> > > > > It looks like there will be no charge for the huge page after migration?
> > > > >
> > > >
> > > > Ah I see! This is a bit subtle indeed.
> > > >
> > > > For the hugetlb controller, it looks like they update the cgroup info
> > > > inside move_hugetlb_state(), which calls hugetlb_cgroup_migrate()
> > > > to transfer the hugetlb cgroup info to the destination folio.
> > > >
> > > > Perhaps we can do something analogous here.
> > > >
> > > > > If my analysis above is correct, then we may need to be careful about
> > > > > this accounting.  We may not want both source and target pages to be
> > > > > charged at the same time.
> > > >
> > > > We can create a variant of mem_cgroup_migrate that does not double
> > > > charge, but instead just copy the mem_cgroup information to the new
> > > > folio, and then clear that info from the old folio. That way the memory
> > > > usage counters are untouched.
> > > >
> > > > Somebody with more expertise on migration should fact check me
> > > > of course :)
> > >
> > > The only reason mem_cgroup_migrate() double charges right now is
> > > because it's used by replace_page_cache_folio(). In that context, the
> > > isolation of the old page isn't quite as thorough as with migration,
> > > so it cannot transfer and uncharge directly. This goes back a long
> > > time: 0a31bc97c80c3fa87b32c091d9a930ac19cd0c40
> > >
> > > If you rename the current implementation to mem_cgroup_replace_page()
> > > for that one caller, you can add a mem_cgroup_migrate() variant which
> > > is charge neutral and clears old->memcg_data. This can be used for
> > > regular and hugetlb page migration. Something like this (totally
> > > untested):
> > >
> > > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > > index a4d3282493b6..17ec45bf3653 100644
> > > --- a/mm/memcontrol.c
> > > +++ b/mm/memcontrol.c
> > > @@ -7226,29 +7226,14 @@ void mem_cgroup_migrate(struct folio *old, struct folio *new)
> > >         if (mem_cgroup_disabled())
> > >                 return;
> > >
> > > -       /* Page cache replacement: new folio already charged? */
> > > -       if (folio_memcg(new))
> > > -               return;
> > > -
> > >         memcg = folio_memcg(old);
> > >         VM_WARN_ON_ONCE_FOLIO(!memcg, old);
> > >         if (!memcg)
> > >                 return;
> > >
> > > -       /* Force-charge the new page. The old one will be freed soon */
> > > -       if (!mem_cgroup_is_root(memcg)) {
> > > -               page_counter_charge(&memcg->memory, nr_pages);
> > > -               if (do_memsw_account())
> > > -                       page_counter_charge(&memcg->memsw, nr_pages);
> > > -       }
> > > -
> > > -       css_get(&memcg->css);
> > > +       /* Transfer the charge and the css ref */
> > >         commit_charge(new, memcg);
> > > -
> > > -       local_irq_save(flags);
> > > -       mem_cgroup_charge_statistics(memcg, nr_pages);
> > > -       memcg_check_events(memcg, folio_nid(new));
> > > -       local_irq_restore(flags);
> > > +       old->memcg_data = 0;
> > >  }
> > >
> > >  DEFINE_STATIC_KEY_FALSE(memcg_sockets_enabled_key);
> > >
> >
> > Ah, I like this. Will send a fixlet based on this :)
> > I was scratching my head trying to figure out why we were
> > doing the double charging in the first place. Thanks for the context,
> > Johannes!
>
> Be sure to check for code similar to this in folio_migrate_flags:
>
> void folio_migrate_flags(struct folio *newfolio, struct folio *folio)
> {
> ...
>         if (!folio_test_hugetlb(folio))
>                 mem_cgroup_migrate(folio, newfolio);
> }
>
> There are many places where hugetlb is special cased.

Yeah makes sense. I'm actually gonna take advantage of this,
and remove the test hugetlb check here, so that it will also
migrate the memcg metadata in this case too.  See the new patch
I just sent out.

> --
> Mike Kravetz

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ