lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <tr7qytjgy5k7hqpd52c2r4vvdae4q3fqoux53hilj6uewidlj3@pljgaa3mzkdd>
Date: Fri, 30 May 2025 15:52:26 +0900
From: Koichiro Den <den@...inux.co.jp>
To: Yuanchu Xie <yuanchu@...gle.com>
Cc: linux-mm@...ck.org, akpm@...ux-foundation.org, yuzhao@...gle.com, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: vmscan: apply proportional reclaim pressure for
 memcg when MGLRU is enabled

On Wed, May 28, 2025 at 01:53:19PM -0700, Yuanchu Xie wrote:
> On Fri, Apr 4, 2025 at 7:11 AM Koichiro Den <koichiro.den@...onical.com> wrote:
> >
> > The scan implementation for MGLRU was missing proportional reclaim
> > pressure for memcg, which contradicts the description in
> > Documentation/admin-guide/cgroup-v2.rst (memory.{low,min} section).
> Nice, this is a discrepancy between the two reclaim implementations.
> Thanks for addressing this.
> 
> >
> > This issue was revealed by the LTP memcontrol03 [1] test case. The
> > following example output from a local test env with no NUMA shows
> > that prior to this patch, proportional protection was not working:
> >
> > * Without this patch (MGLRU enabled):
> >   $ sudo LTP_SINGLE_FS_TYPE=xfs ./memcontrol03
> >     ...
> >     memcontrol03.c:214: TPASS: Expect: (A/B/C memory.current=25964544) ~= 34603008
> >     memcontrol03.c:216: TPASS: Expect: (A/B/D memory.current=26038272) ~= 17825792
> >     ...
> >
> > * With this patch (MGLRU enabled):
> >   $ sudo LTP_SINGLE_FS_TYPE=xfs ./memcontrol03
> >     ...
> >     memcontrol03.c:214: TPASS: Expect: (A/B/C memory.current=29327360) ~= 34603008
> >     memcontrol03.c:216: TPASS: Expect: (A/B/D memory.current=23748608) ~= 17825792
> >     ...
> >
> > * When MGLRU is disabled:
> >   $ sudo LTP_SINGLE_FS_TYPE=xfs ./memcontrol03
> >     ...
> >     memcontrol03.c:214: TPASS: Expect: (A/B/C memory.current=28819456) ~= 34603008
> >     memcontrol03.c:216: TPASS: Expect: (A/B/D memory.current=24018944) ~= 17825792
> >     ...
> >
> > Note that the test shows TPASS for all cases here due to its lenient
> > criteria. And even with this patch, or when MGLRU is disabled, the
> > results above show slight deviation from the expected values, but this
> > is due to relatively small mem usage compared to the >> DEF_PRIORITY
> > adjustment.
> It's kind of disappointing that the LTP test doesn't fail when reclaim
> pressure scaling doesn't work. Would you be interested in fixing the
> test as well?

Thanks for your comment, it made me realize that there are two upstream commits:
f10b6e9a8e66 ("selftests: memcg: adjust expected reclaim values of protected cgroups")
d2def68ae06a ("selftests: memcg: increase error tolerance of child memory.current check in test_memcg_protection()")

The numbers I wrote are actually quite close to the simulated numbers, and
the test would've passed if it had been kselftest (even without the commit
d2def68ae06a):

  # deviation, but would've passed under upstream criteria
  abs(25964544-29M) / (25964544+29M) ~= 7%
  abs(26038272-21M) / (26038272+21M) ~= 8%

  # close to the expected numbers
  abs(29327360-29M) / (29327360+29M) ~= 1%
  abs(23748608-21M) / (23748608+21M) ~= 3%
  abs(28819456-29M) / (28819456+29M) ~= 2%
  abs(24018944-21M) / (24018944+21M) ~= 3%

So at least the git commit message should be updated. The issue is that
the LTP test is still using naive numbers and lenient criteria, which was
updated when it was ported from v5.16 kselftest.
I'm now unsure how the LTP memcontrol03 test will be maintained.

> 
> >
> > Factor out the proportioning logic to a new function and have MGLRU
> > reuse it.
> >
> > [1] https://github.com/linux-test-project/ltp/blob/master/testcases/kernel/controllers/memcg/memcontrol03.c
> >
> > Signed-off-by: Koichiro Den <koichiro.den@...onical.com>
> > ---
> >  mm/vmscan.c | 148 +++++++++++++++++++++++++++-------------------------
> >  1 file changed, 78 insertions(+), 70 deletions(-)
> >
> > diff --git a/mm/vmscan.c b/mm/vmscan.c
> > index b620d74b0f66..c594d8264938 100644
> > --- a/mm/vmscan.c
> > +++ b/mm/vmscan.c
> > @@ -2467,6 +2467,69 @@ static inline void calculate_pressure_balance(struct scan_control *sc,
> >         *denominator = ap + fp;
> >  }
> >
> > +static unsigned long apply_proportional_protection(struct mem_cgroup *memcg,
> > +               struct scan_control *sc, unsigned long scan)
> > +{
> > +       unsigned long min, low;
> > +
> > +       mem_cgroup_protection(sc->target_mem_cgroup, memcg, &min, &low);
> > +
---(snip)---
> > @@ -5477,7 +5485,7 @@ static int run_eviction(struct lruvec *lruvec, unsigned long seq, struct scan_co
> >                 if (sc->nr_reclaimed >= nr_to_reclaim)
> >                         return 0;
> >
> > -               if (!evict_folios(lruvec, sc, swappiness))
> > +               if (!evict_folios(MAX_LRU_BATCH, lruvec, sc, swappiness))
> >                         return 0;
> Right now this change preserves the current behavior, but given this
> is only invoked from the debugfs interface, it would be reasonable to
> also change this to something like nr_to_reclaim - sc->nr_reclaimed so
> the run_eviction evicts closer to nr_to_reclaim number of pages.
> Closer to what it advertises, but different from the current behavior.
> I have no strong opinion here, so if you're a user of this proactive
> reclaim interface and would prefer to change it, go ahead.

You're right. I'll send v2 with this change as well.
I'll also update the git commit message as I mentioned above.

Thank you for the review.

Koichiro

> 
> >
> >                 cond_resched();
> > --
> > 2.45.2
> >
> >
> 
> Reviewed-by: Yuanchu Xie <yuanchu@...gle.com>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ