lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAN+CAwN34zQdjuOhH0Vm0k6=im9=vVvwH_yCh_z4zvuMzPSjTg@mail.gmail.com>
Date: Sat, 9 Nov 2024 13:58:46 -0500
From: Joshua Hahn <joshua.hahnjy@...il.com>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: hannes@...xchg.org, mhocko@...nel.org, roman.gushchin@...ux.dev, 
	muchun.song@...ux.dev, akpm@...ux-foundation.org, cgroups@...r.kernel.org, 
	linux-mm@...ck.org, linux-kernel@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH 2/3] memcg/hugetlb: Introduce mem_cgroup_charge_hugetlb

Hello Shakeel, thank you for reviewing my patch!

On Fri, Nov 8, 2024 at 5:43 PM Shakeel Butt <shakeel.butt@...ux.dev> wrote:
>
> On Fri, Nov 08, 2024 at 01:29:45PM -0800, Joshua Hahn wrote:
> > This patch introduces mem_cgroup_charge_hugetlb, which combines the
> > logic of mem_cgroup{try,commit}_hugetlb. This reduces the footprint of
> > memcg in hugetlb code, and also consolidates the error path that memcg
> > can take into just one point.
> >
> > Signed-off-by: Joshua Hahn <joshua.hahnjy@...il.com>
> > -     if (!memcg_charge_ret)
> > -             mem_cgroup_commit_charge(folio, memcg);
> > -     lruvec_stat_mod_folio(folio, NR_HUGETLB, pages_per_huge_page(h));
> > -     mem_cgroup_put(memcg);
> > +     ret = mem_cgroup_charge_hugetlb(folio, gfp);
> > +     if (ret == -ENOMEM) {
> > +             spin_unlock_irq(&hugetlb_lock);
>
> spin_unlock_irq??

Thank you for the catch. I completely missed this after I
swapped the position of mem_cgroup_charge_hugetlb
to be called without the lock. I will fix this.

> > +             free_huge_folio(folio);
>
> free_huge_folio() will call lruvec_stat_mod_folio() unconditionally but
> you are only calling it on success. This may underflow the metric.

I was actually thinking about this too. I was wondering what would
make sense -- in the original draft of this patch, I had the charge
increment be called unconditionally as well. The idea was that
even though it would not make sense to have the stat incremented
when there is an error, it would eventually be corrected by
free_huge_folio's decrement. However, because there is nothing
stopping the user from checking the stat in this period, they may
temporarily see that the value is incremented in memory.stat,
even though they were not able to obtain this page.

With that said, maybe it makes sense to increment unconditionally,
if free also decrements unconditionally. This race condition is
not something that will cause a huge problem for the user,
although users relying on userspace monitors for memory.stat
to handle memory management may see some problems.

Maybe what would make the most sense is to do both
incrementing & decrementing conditionally as well.
Thank you for your feedback, I will iterate on this for the next version!

> > +int mem_cgroup_charge_hugetlb(struct folio *folio, gfp_t gfp)
> > +{
> > +     struct mem_cgroup *memcg = get_mem_cgroup_from_current();
> > +     int ret = 0;
> > +
> > +     if (mem_cgroup_disabled() || !memcg_accounts_hugetlb() ||
> > +             !memcg || !cgroup_subsys_on_dfl(memory_cgrp_subsys)) {
> > +             ret = -EOPNOTSUPP;
>
> why EOPNOTSUPP? You need to return 0 here. We do want
> lruvec_stat_mod_folio() to be called.

In this case, I was just preserving the original code's return statements.
That is, in mem_cgroup_hugetlb_try_charge, the intended behavior
was to return -EOPNOTSUPP if any of these conditions were met.
If I understand the code correctly, calling lruvec_stat_mod_folio()
on this failure will be a noop, since either memcg doesn't account
hugetlb folios / there is no memcg / memcg is disabled.

With all of this said, I think your feedback makes the most sense
here, given the new semantics of the function: if there is no
memcg or memcg doesn't account hugetlb, then there is no
way that the limit can be reached! I will go forward with returning 0,
and calling lruvec_stat_mod_folio (which will be a noop).

Thank you for your detailed feedback. I wish I had caught these
errors myself, thank you for your time in reviewing my patch.

I hope you have a great rest of your weekend!
Joshua

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ