linux-kernel - Re: [PATCH] memcg: async flush memcg stats from perf sensitive codepaths

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20220225172020.b3e59e11a0a3dd15e0d34141@linux-foundation.org>
Date:   Fri, 25 Feb 2022 17:20:20 -0800
From:   Andrew Morton <akpm@...ux-foundation.org>
To:     Shakeel Butt <shakeelb@...gle.com>,
        =?ISO-8859-1?Q? "Michal_Koutn=FD" ?= <mkoutny@...e.com>,
        Johannes Weiner <hannes@...xchg.org>,
        Michal Hocko <mhocko@...nel.org>,
        Roman Gushchin <roman.gushchin@...ux.dev>,
        Ivan Babrou <ivan@...udflare.com>, cgroups@...r.kernel.org,
        linux-mm@...ck.org, linux-kernel@...r.kernel.org,
        Daniel Dao <dqminh@...udflare.com>, stable@...r.kernel.org
Subject: Re: [PATCH] memcg: async flush memcg stats from perf sensitive
 codepaths

On Fri, 25 Feb 2022 16:58:42 -0800 Andrew Morton <akpm@...ux-foundation.org> wrote:

> On Fri, 25 Feb 2022 16:24:12 -0800 Shakeel Butt <shakeelb@...gle.com> wrote:
> 
> > Daniel Dao has reported [1] a regression on workloads that may trigger
> > a lot of refaults (anon and file). The underlying issue is that flushing
> > rstat is expensive. Although rstat flush are batched with (nr_cpus *
> > MEMCG_BATCH) stat updates, it seems like there are workloads which
> > genuinely do stat updates larger than batch value within short amount of
> > time. Since the rstat flush can happen in the performance critical
> > codepaths like page faults, such workload can suffer greatly.
> > 
> > The easiest fix for now is for performance critical codepaths trigger
> > the rstat flush asynchronously. This patch converts the refault codepath
> > to use async rstat flush. In addition, this patch has premptively
> > converted mem_cgroup_wb_stats and shrink_node to also use the async
> > rstat flush as they may also similar performance regressions.
> 
> Gee we do this trick a lot and gee I don't like it :(
> 
> a) if we're doing too much work then we're doing too much work. 
>    Punting that work over to a different CPU or thread doesn't alter
>    that - it in fact adds more work.
> 
> b) there's an assumption here that the flusher is able to keep up
>    with the producer.  What happens if that isn't the case?  Do we
>    simply wind up the deferred items until the system goes oom?
> 
>    What happens if there's a producer running on each CPU?  Can the
>    flushers keep up?
> 
>    Pathologically, what happens if the producer is running
>    task_is_realtime() on a single-CPU system?  Or if there's a
>    task_is_realtime() producer running on every CPU?  The flusher never
>    gets to run and we're dead?

Not some theoretical thing, btw.  See how __read_swap_cache_async()
just got its sins exposed by real-time tasks:
https://lkml.kernel.org/r/20220221111749.1928222-1-cgel.zte@gmail.com