linux-kernel - Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJD7tka5UnHBz=eX1LtynAjJ+O_oredMKBBL3kFNfG7PHjuMCw@mail.gmail.com>
Date:   Mon, 23 Oct 2023 11:25:55 -0700
From:   Yosry Ahmed <yosryahmed@...gle.com>
To:     Feng Tang <feng.tang@...el.com>
Cc:     Shakeel Butt <shakeelb@...gle.com>,
        "Sang, Oliver" <oliver.sang@...el.com>,
        "oe-lkp@...ts.linux.dev" <oe-lkp@...ts.linux.dev>,
        lkp <lkp@...el.com>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "Huang, Ying" <ying.huang@...el.com>,
        "Yin, Fengwei" <fengwei.yin@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <muchun.song@...ux.dev>,
        Ivan Babrou <ivan@...udflare.com>, Tejun Heo <tj@...nel.org>,
        Michal Koutný <mkoutny@...e.com>,
        Waiman Long <longman@...hat.com>,
        "kernel-team@...udflare.com" <kernel-team@...udflare.com>,
        Wei Xu <weixugc@...gle.com>, Greg Thelen <gthelen@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg

On Sun, Oct 22, 2023 at 6:34 PM Feng Tang <feng.tang@...el.com> wrote:
>
> On Sat, Oct 21, 2023 at 01:42:58AM +0800, Yosry Ahmed wrote:
> > On Fri, Oct 20, 2023 at 10:23 AM Shakeel Butt <shakeelb@...gle.com> wrote:
> > >
> > > On Fri, Oct 20, 2023 at 9:18 AM kernel test robot <oliver.sang@...el.com> wrote:
> > > >
> > > >
> > > >
> > > > Hello,
> > > >
> > > > kernel test robot noticed a -25.8% regression of will-it-scale.per_thread_ops on:
> > > >
> > > >
> > > > commit: 51d74c18a9c61e7ee33bc90b522dd7f6e5b80bb5 ("[PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg")
> > > > url: https://github.com/intel-lab-lkp/linux/commits/Yosry-Ahmed/mm-memcg-change-flush_next_time-to-flush_last_time/20231010-112257
> > > > base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
> > > > patch link: https://lore.kernel.org/all/20231010032117.1577496-4-yosryahmed@google.com/
> > > > patch subject: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg
> > > >
> > > > testcase: will-it-scale
> > > > test machine: 104 threads 2 sockets (Skylake) with 192G memory
> > > > parameters:
> > > >
> > > >         nr_task: 100%
> > > >         mode: thread
> > > >         test: fallocate1
> > > >         cpufreq_governor: performance
> > > >
> > > >
> > > > In addition to that, the commit also has significant impact on the following tests:
> > > >
> > > > +------------------+---------------------------------------------------------------+
> > > > | testcase: change | will-it-scale: will-it-scale.per_thread_ops -30.0% regression |
> > > > | test machine     | 104 threads 2 sockets (Skylake) with 192G memory              |
> > > > | test parameters  | cpufreq_governor=performance                                  |
> > > > |                  | mode=thread                                                   |
> > > > |                  | nr_task=50%                                                   |
> > > > |                  | test=fallocate1                                               |
> > > > +------------------+---------------------------------------------------------------+
> > > >
> > >
> > > Yosry, I don't think 25% to 30% regression can be ignored. Unless
> > > there is a quick fix, IMO this series should be skipped for the
> > > upcoming kernel open window.
> >
> > I am currently looking into it. It's reasonable to skip the next merge
> > window if a quick fix isn't found soon.
> >
> > I am surprised by the size of the regression given the following:
> >       1.12 ą  5%      +1.4        2.50 ą  2%
> > perf-profile.self.cycles-pp.__mod_memcg_lruvec_state
> >
> > IIUC we are only spending 1% more time in __mod_memcg_lruvec_state().
>
> Yes, this is kind of confusing. And we have seen similar cases before,
> espcially for micro benchmark like will-it-scale, stressng, netperf
> etc, the change to those functions in hot path was greatly amplified
> in the final benchmark score.
>
> In a netperf case, https://lore.kernel.org/lkml/20220619150456.GB34471@xsang-OptiPlex-9020/
> the affected functions have around 10% change in perf's cpu-cycles,
> and trigger 69% regression. IIRC, micro benchmarks are very sensitive
> to those statistics update, like memcg's and vmstat.
>

Thanks for clarifying. I am still trying to reproduce locally but I am
running into some quirks with tooling. I may have to run a modified
version of the fallocate test manually. Meanwhile, I noticed that the
patch was tested without the fixlet that I posted [1] for it,
understandably. Would it be possible to get some numbers with that
fixlet? It should reduce the total number of contended atomic
operations, so it may help.

[1]https://lore.kernel.org/lkml/CAJD7tkZDarDn_38ntFg5bK2fAmFdSe+Rt6DKOZA7Sgs_kERoVA@mail.gmail.com/

I am also wondering if aligning the stats_updates atomic will help.
Right now it may share a cacheline with some items of the
events_pending array. The latter may be dirtied during a flush and
unnecessarily dirty the former, but the chances are slim to be honest.
If it's easy to test such a diff, that would be nice, but I don't
expect a lot of difference:

diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index 7cbc7d94eb65..a35fce653262 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -646,7 +646,7 @@ struct memcg_vmstats {
        unsigned long           events_pending[NR_MEMCG_EVENTS];

        /* Stats updates since the last flush */
-       atomic64_t              stats_updates;
+       atomic64_t              stats_updates ____cacheline_aligned_in_smp;
 };

 /*