[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <aBsDM6aa4u50xgnj@google.com>
Date: Wed, 7 May 2025 06:52:35 +0000
From: Yosry Ahmed <yosry.ahmed@...ux.dev>
To: Shakeel Butt <shakeel.butt@...ux.dev>
Cc: Tejun Heo <tj@...nel.org>, Andrew Morton <akpm@...ux-foundation.org>,
Alexei Starovoitov <ast@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>,
Michal Koutný <mkoutny@...e.com>,
Vlastimil Babka <vbabka@...e.cz>,
Sebastian Andrzej Siewior <bigeasy@...utronix.de>,
JP Kobryn <inwardvessel@...il.com>, bpf@...r.kernel.org,
linux-mm@...ck.org, cgroups@...r.kernel.org,
linux-kernel@...r.kernel.org,
Meta kernel team <kernel-team@...a.com>
Subject: Re: [RFC PATCH 3/3] cgroup: make css_rstat_updated nmi safe
On Tue, May 06, 2025 at 12:30:18PM -0700, Shakeel Butt wrote:
> On Tue, May 06, 2025 at 09:41:04AM +0000, Yosry Ahmed wrote:
> > On Thu, May 01, 2025 at 03:10:20PM -0700, Shakeel Butt wrote:
> > > On Wed, Apr 30, 2025 at 06:14:28AM -0700, Yosry Ahmed wrote:
> > > [...]
> > > > > +
> > > > > + if (!_css_rstat_cpu_trylock(css, cpu, &flags)) {
> > > >
> > > >
> > > > IIUC this trylock will only fail if a BPF program runs in NMI context
> > > > and tries to update cgroup stats, interrupting a context that is already
> > > > holding the lock (i.e. updating or flushing stats).
> > > >
> > >
> > > Correct (though note that flushing side can be on a different CPU).
> > >
> > > > How often does this happen in practice tho? Is it worth the complexity?
> > >
> > > This is about correctness, so even a chance of occurance need the
> > > solution.
> >
> > Right, my question was more about the need to special case NMIs, see
> > below.
> >
> > >
> > > >
> > > > I wonder if it's better if we make css_rstat_updated() inherently
> > > > lockless instead.
> > > >
> > > > What if css_rstat_updated() always just adds to a lockless tree,
> > >
> > > Here I assume you meant lockless list instead of tree.
> >
> > Yeah, in a sense. I meant using lockless lists to implement the rstat
> > tree instead of normal linked lists.
> >
> > >
> > > > and we
> > > > defer constructing the proper tree to the flushing side? This should
> > > > make updates generally faster and avoids locking or disabling interrupts
> > > > in the fast path. We essentially push more work to the flushing side.
> > > >
> > > > We may be able to consolidate some of the code too if all the logic
> > > > manipulating the tree is on the flushing side.
> > > >
> > > > WDYT? Am I missing something here?
> > > >
> > >
> > > Yes this can be done but I don't think we need to tie that to current
> > > series. I think we can start with lockless in the nmi context and then
> > > iteratively make css_rstat_updated() lockless for all contexts.
> >
> > My question is basically whether it would be simpler to actually make it
> > all lockless than special casing NMIs. With this patch we have two
> > different paths and a deferred list that we process at a later point. I
> > think it may be simpler if we just make it all lockless to begin with.
> > Then we would have a single path and no special deferred processing.
> >
> > WDYT?
>
> So, in the update side, always add to the lockless list (if not already)
> and on the flush side, built the udpate tree from the lockless list and
> flush it.
Exactly, yes.
> Hopefully this tree building and flushing can be done in a
> more optimized way. Is this what you are suggesting?
Yes, but this latter part can be a follow up if it's not straight
forward. For now we can just add use a lockless list on the update side
and move updating the tree (i.e updated_next and updated_children) to
the flush side.
Powered by blists - more mailing lists