linux-kernel - Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20231013023329.GG470544@cmpxchg.org>
Date:   Thu, 12 Oct 2023 22:33:29 -0400
From:   Johannes Weiner <hannes@...xchg.org>
To:     Yosry Ahmed <yosryahmed@...gle.com>
Cc:     Shakeel Butt <shakeelb@...gle.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <muchun.song@...ux.dev>,
        Ivan Babrou <ivan@...udflare.com>, Tejun Heo <tj@...nel.org>,
        Michal Koutný <mkoutny@...e.com>,
        Waiman Long <longman@...hat.com>, kernel-team@...udflare.com,
        Wei Xu <weixugc@...gle.com>, Greg Thelen <gthelen@...gle.com>,
        linux-mm@...ck.org, cgroups@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg

On Thu, Oct 12, 2023 at 04:28:49PM -0700, Yosry Ahmed wrote:
> [..]
> > > >
> > > > Using next-20231009 and a similar 44 core machine with hyperthreading
> > > > disabled, I ran 22 instances of netperf in parallel and got the
> > > > following numbers from averaging 20 runs:
> > > >
> > > > Base: 33076.5 mbps
> > > > Patched: 31410.1 mbps
> > > >
> > > > That's about 5% diff. I guess the number of iterations helps reduce
> > > > the noise? I am not sure.
> > > >
> > > > Please also keep in mind that in this case all netperf instances are
> > > > in the same cgroup and at a 4-level depth. I imagine in a practical
> > > > setup processes would be a little more spread out, which means less
> > > > common ancestors, so less contended atomic operations.
> > >
> > >
> > > (Resending the reply as I messed up the last one, was not in plain text)
> > >
> > > I was curious, so I ran the same testing in a cgroup 2 levels deep
> > > (i.e /sys/fs/cgroup/a/b), which is a much more common setup in my
> > > experience. Here are the numbers:
> > >
> > > Base: 40198.0 mbps
> > > Patched: 38629.7 mbps
> > >
> > > The regression is reduced to ~3.9%.
> > >
> > > What's more interesting is that going from a level 2 cgroup to a level
> > > 4 cgroup is already a big hit with or without this patch:
> > >
> > > Base: 40198.0 -> 33076.5 mbps (~17.7% regression)
> > > Patched: 38629.7 -> 31410.1 (~18.7% regression)
> > >
> > > So going from level 2 to 4 is already a significant regression for
> > > other reasons (e.g. hierarchical charging). This patch only makes it
> > > marginally worse. This puts the numbers more into perspective imo than
> > > comparing values at level 4. What do you think?
> >
> > I think it's reasonable.
> >
> > Especially comparing to how many cachelines we used to touch on the
> > write side when all flushing happened there. This looks like a good
> > trade-off to me.
> 
> Thanks.
> 
> Still wanting to figure out if this patch is what you suggested in our
> previous discussion [1], to add a
> Suggested-by if appropriate :)
> 
> [1]https://lore.kernel.org/lkml/20230913153758.GB45543@cmpxchg.org/

Haha, sort of. I suggested the cgroup-level flush-batching, but my
proposal was missing the clever upward propagation of the pending stat
updates that you added.

You can add the tag if you're feeling generous, but I wouldn't be mad
if you don't!