linux-kernel - Re: [PATCH] mm: writeback: ratelimit stat flush from mem_cgroup_wb

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <CALvZod4XUrQMxptBo56Fm6-ETQy_DtVq-g4NKokVvSyGwDOnxg@mail.gmail.com>
Date: Mon, 22 Jan 2024 10:19:10 -0800
From: Shakeel Butt <shakeelb@...gle.com>
To: Michal Koutný <mkoutny@...e.com>
Cc: Andrew Morton <akpm@...ux-foundation.org>, Jens Axboe <axboe@...nel.dk>, 
	Johannes Weiner <hannes@...xchg.org>, Tejun Heo <tj@...nel.org>, Jan Kara <jack@...e.cz>, 
	Roman Gushchin <roman.gushchin@...ux.dev>, Michal Hocko <mhocko@...nel.org>, 
	Muchun Song <muchun.song@...ux.dev>, cgroups@...r.kernel.org, linux-mm@...ck.org, 
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH] mm: writeback: ratelimit stat flush from mem_cgroup_wb_stats

On Mon, Jan 22, 2024 at 7:20 AM Michal Koutný <mkoutny@...e.com> wrote:
>
> Hello.
>
> On Thu, Jan 18, 2024 at 06:42:35PM +0000, Shakeel Butt <shakeelb@...gle.com> wrote:
> > One of our workloads (Postgres 14) has regressed when migrated from 5.10
> > to 6.1 upstream kernel. The regression can be reproduced by sysbench's
> > oltp_write_only benchmark.
> > It seems like the always on rstat flush in
> > mem_cgroup_wb_stats() is causing the regression.
>
> Is the affected benchmark running in a non-root cgroup?
>
> I'm asking whether this would warrant a
>   Fixes: fd25a9e0e23b ("memcg: unify memcg stat flushing")
> that introduced the global flush (in v6.1) but it was removed later in
>   7d7ef0a4686a ("mm: memcg: restore subtree stats flushing")
> (so v6.8 could be possibly unaffected).
>

Yes, the benchmark and the workload were running in non-root cgroups.

Regarding the Fixes, please note that the regression was still there
with 7d7ef0a4686a ("mm: memcg: restore subtree stats flushing"), so I
would say that our first conversion to rstat infra would most probably
have the issue as well which was 2d146aa3aa84 ("mm: memcontrol: switch
to rstat").

So, the following fixes tag makes sense to me:

Fixes: 2d146aa3aa84 ("mm: memcontrol: switch to rstat")