lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Wed, 25 Oct 2023 11:36:13 -0700
From:   Yosry Ahmed <yosryahmed@...gle.com>
To:     Shakeel Butt <shakeelb@...gle.com>
Cc:     Oliver Sang <oliver.sang@...el.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Feng Tang <feng.tang@...el.com>,
        "oe-lkp@...ts.linux.dev" <oe-lkp@...ts.linux.dev>,
        lkp <lkp@...el.com>,
        "cgroups@...r.kernel.org" <cgroups@...r.kernel.org>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "Huang, Ying" <ying.huang@...el.com>,
        "Yin, Fengwei" <fengwei.yin@...el.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Muchun Song <muchun.song@...ux.dev>,
        Ivan Babrou <ivan@...udflare.com>, Tejun Heo <tj@...nel.org>,
        Michal Koutný <mkoutny@...e.com>,
        Waiman Long <longman@...hat.com>,
        "kernel-team@...udflare.com" <kernel-team@...udflare.com>,
        Wei Xu <weixugc@...gle.com>, Greg Thelen <gthelen@...gle.com>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Domenico Cerasuolo <cerasuolodomenico@...il.com>
Subject: Re: [PATCH v2 3/5] mm: memcg: make stats flushing threshold per-memcg

On Wed, Oct 25, 2023 at 10:06 AM Shakeel Butt <shakeelb@...gle.com> wrote:
>
> On Tue, Oct 24, 2023 at 11:23 PM Yosry Ahmed <yosryahmed@...gle.com> wrote:
> >
> [...]
> >
> > Thanks Oliver for running the numbers. If I understand correctly the
> > will-it-scale.fallocate1 microbenchmark is the only one showing
> > significant regression here, is this correct?
> >
> > In my runs, other more representative microbenchmarks benchmarks like
> > netperf and will-it-scale.page_fault* show minimal regression. I would
> > expect practical workloads to have high concurrency of page faults or
> > networking, but maybe not fallocate/ftruncate.
> >
> > Oliver, in your experience, how often does such a regression in such a
> > microbenchmark translate to a real regression that people care about?
> > (or how often do people dismiss it?)
> >
> > I tried optimizing this further for the fallocate/ftruncate case but
> > without luck. I even tried moving stats_updates into cgroup core
> > (struct cgroup_rstat_cpu) to reuse the existing loop in
> > cgroup_rstat_updated() -- but it somehow made it worse.
> >
> > On the other hand, we do have some machines in production running this
> > series together with a previous optimization for non-hierarchical
> > stats [1] on an older kernel, and we do see significant reduction in
> > cpu time spent on reading the stats. Domenico did a similar experiment
> > with only this series and reported similar results [2].
> >
> > Shakeel, Johannes, (and other memcg folks), I personally think the
> > benefits here outweigh a regression in this particular benchmark, but
> > I am obviously biased. What do you think?
> >
> > [1]https://lore.kernel.org/lkml/20230726153223.821757-2-yosryahmed@google.com/
> > [2]https://lore.kernel.org/lkml/CAFYChMv_kv_KXOMRkrmTN-7MrfgBHMcK3YXv0dPYEL7nK77e2A@mail.gmail.com/
>
> I still am not convinced of the benefits outweighing the regression
> but I would not block this. So, let's do this, skip this open window,
> get the patch series reviewed and hopefully we can work together on
> fixing that regression and we can make an informed decision of
> accepting the regression for this series for the next cycle.

Skipping this open window sounds okay to me.

FWIW, I think with this patch series we can keep the old behavior
(roughly) and hide the changes behind a tunable (config option or
sysfs file). I think the only changes that need to be done to the code
to approximate the previous behavior are:
- Use root when updating the pending stats in memcg_rstat_updated()
instead of the passed memcg.
- Use root in mem_cgroup_flush_stats() instead of the passed memcg.
- Use mutex_trylock() instead of mutex_lock() in mem_cgroup_flush_stats().

So I think it should be doable to hide most changes behind a tunable,
but let's not do this unless necessary.

Powered by blists - more mailing lists