[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAADnVQLtc+OOQ67AS_1+u-sRmO+bDLWJrrihASXMrDNnvrmNSw@mail.gmail.com>
Date: Mon, 8 Sep 2025 10:11:29 -0700
From: Alexei Starovoitov <alexei.starovoitov@...il.com>
To: Michal Hocko <mhocko@...e.com>
Cc: Peilin Ye <yepeilin@...gle.com>, Shakeel Butt <shakeel.butt@...ux.dev>,
Andrew Morton <akpm@...ux-foundation.org>, Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>, Roman Gushchin <roman.gushchin@...ux.dev>,
Muchun Song <muchun.song@...ux.dev>, Alexei Starovoitov <ast@...nel.org>,
Kumar Kartikeya Dwivedi <memxor@...il.com>, bpf <bpf@...r.kernel.org>, linux-mm <linux-mm@...ck.org>,
"open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
Meta kernel team <kernel-team@...a.com>
Subject: Re: [PATCH] memcg: skip cgroup_file_notify if spinning is not allowed
On Mon, Sep 8, 2025 at 2:08 AM Michal Hocko <mhocko@...e.com> wrote:
>
> On Fri 05-09-25 20:48:46, Peilin Ye wrote:
> > On Fri, Sep 05, 2025 at 01:16:06PM -0700, Shakeel Butt wrote:
> > > Generally memcg charging is allowed from all the contexts including NMI
> > > where even spinning on spinlock can cause locking issues. However one
> > > call chain was missed during the addition of memcg charging from any
> > > context support. That is try_charge_memcg() -> memcg_memory_event() ->
> > > cgroup_file_notify().
> > >
> > > The possible function call tree under cgroup_file_notify() can acquire
> > > many different spin locks in spinning mode. Some of them are
> > > cgroup_file_kn_lock, kernfs_notify_lock, pool_workqeue's lock. So, let's
> > > just skip cgroup_file_notify() from memcg charging if the context does
> > > not allow spinning.
> > >
> > > Signed-off-by: Shakeel Butt <shakeel.butt@...ux.dev>
> >
> > Tested-by: Peilin Ye <yepeilin@...gle.com>
> >
> > The repro described in [1] no longer triggers locking issues after
> > applying this patch and making __bpf_async_init() use __GFP_HIGH
> > instead of GFP_ATOMIC:
> >
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -1275,7 +1275,7 @@ static int __bpf_async_init(struct bpf_async_kern *async, struct bpf_map *map, u
> > }
> >
> > /* allocate hrtimer via map_kmalloc to use memcg accounting */
> > - cb = bpf_map_kmalloc_node(map, size, GFP_ATOMIC, map->numa_node);
> > + cb = bpf_map_kmalloc_node(map, size, __GFP_HIGH, map->numa_node);
>
> Why do you need to consume memory reserves? Shouldn't kmalloc_nolock be
> used instead here?
Yes. That's a plan. We'll convert most of bpf allocations to kmalloc_nolock()
when it lands.
Powered by blists - more mailing lists