[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAVpQUAVrsPxPj9zArvFcD76KYjY_X1gWvd7LnyBcQQQgYZ0cw@mail.gmail.com>
Date: Wed, 27 Aug 2025 17:17:20 -0700
From: Kuniyuki Iwashima <kuniyu@...gle.com>
To: Martin KaFai Lau <martin.lau@...ux.dev>
Cc: Alexei Starovoitov <ast@...nel.org>, Andrii Nakryiko <andrii@...nel.org>,
Daniel Borkmann <daniel@...earbox.net>, John Fastabend <john.fastabend@...il.com>,
Stanislav Fomichev <sdf@...ichev.me>, Johannes Weiner <hannes@...xchg.org>, Michal Hocko <mhocko@...nel.org>,
Roman Gushchin <roman.gushchin@...ux.dev>, Shakeel Butt <shakeel.butt@...ux.dev>,
"David S. Miller" <davem@...emloft.net>, Eric Dumazet <edumazet@...gle.com>,
Jakub Kicinski <kuba@...nel.org>, Paolo Abeni <pabeni@...hat.com>,
Neal Cardwell <ncardwell@...gle.com>, Willem de Bruijn <willemb@...gle.com>,
Mina Almasry <almasrymina@...gle.com>, Kuniyuki Iwashima <kuni1840@...il.com>, bpf@...r.kernel.org,
netdev@...r.kernel.org
Subject: Re: [PATCH v3 bpf-next/net 3/5] bpf: Introduce SK_BPF_MEMCG_FLAGS and SK_BPF_MEMCG_SOCK_ISOLATED.
On Wed, Aug 27, 2025 at 4:48 PM Martin KaFai Lau <martin.lau@...ux.dev> wrote:
>
> On 8/26/25 11:38 AM, Kuniyuki Iwashima wrote:
> > The main targets are BPF_CGROUP_INET_SOCK_CREATE and
> > BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB as demonstrated in the selftest.
> >
> > Note that we do not support modifying the flag once sk->sk_memcg is set
> > especially because UDP charges memory under sk->sk_receive_queue.lock
> > instead of lock_sock().
> >
>
> [ ... ]
>
> > diff --git a/include/net/sock.h b/include/net/sock.h
> > index 63a6a48afb48..d41a2f8f8b30 100644
> > --- a/include/net/sock.h
> > +++ b/include/net/sock.h
> > @@ -2596,10 +2596,39 @@ static inline gfp_t gfp_memcg_charge(void)
> > return in_softirq() ? GFP_ATOMIC : GFP_KERNEL;
> > }
> >
> > +#define SK_BPF_MEMCG_FLAG_MASK (SK_BPF_MEMCG_FLAG_MAX - 1)
> > +#define SK_BPF_MEMCG_PTR_MASK ~SK_BPF_MEMCG_FLAG_MASK
> > +
> > #ifdef CONFIG_MEMCG
> > +static inline void mem_cgroup_sk_set_flags(struct sock *sk, unsigned short flags)
> > +{
> > + unsigned long val = (unsigned long)sk->sk_memcg;
> > +
> > + val |= flags;
> > + sk->sk_memcg = (struct mem_cgroup *)val;
> > +}
> > +
>
> [ ... ]
>
> > +static int sk_bpf_set_get_memcg_flags(struct sock *sk, int *optval, bool getopt)
> > +{
> > + if (!sk_has_account(sk))
> > + return -EOPNOTSUPP;
> > +
> > + if (getopt) {
> > + *optval = mem_cgroup_sk_get_flags(sk);
> > + return 0;
> > + }
> > +
> > + if (sock_owned_by_user_nocheck(sk) && mem_cgroup_from_sk(sk))
>
> I still don't see how this can stop some of the bh bpf prog to set/clear the bit
> after mem_cgroup_sk_alloc() is done. For example, in BPF_SOCK_OPS_RETRANS_CB,
> bh_lock_sock() is held but not owned by user, so the bpf prog can continue to
> change the SK_BPF_MEMCG_FLAGS. What am I missing?
You are totally right. I think I was just confused.
>
> Passing some more args to __bpf_setsockopt() for caller context purpose is quite
> ugly which I think you also mentioned/tried earlier.
Another option I was thinking (but didn't try) was extending void *ctx
from struct sock * to like below and silently cast ctx back to the original
one in the dedicated setsockopt proto.
struct {
struct sock * sk;
/* caller info here */
}
> May be it can check "if
> (in_softirq() && mem_cgroup_from_sk(sk)) return -EBUSY" but looks quite tricky
> together with the sock_owned_by_user_nocheck().
>
> It seems this sk_memcg bit change is only needed for sock_ops hook and create
> hook? sock_ops already has its only func_proto bpf_sock_ops_setsockopt().
> bpf_sock_ops_setsockopt can directly call sk_bpf_set_get_memcg_flags() but limit
> it to the BPF_SOCK_OPS_PASSIVE_ESTABLISHED_CB. Is it also safe to allow it for
> BPF_SOCK_OPS_TCP_LISTEN_CB? For the create hook, create a new func proto which
> can directly call sk_bpf_set_get_memcg_flags(). wdyt?
I prefer the new proto instead of a hacky/complicated approach.
>
> > + return -EBUSY;
> > +
> > + if (*optval <= 0 || *optval >= SK_BPF_MEMCG_FLAG_MAX)
>
> It seems the bit cannot be cleared (the *optval <= 0 check). There could be
> multiple bpf progs running in a cgroup which want to clear this bit. e.g. the
> parent cgroup's prog may want to ensure this bit is not set.
I didn't think about the situation, so the set helper will need to
reset all flags (in case there could be multiple bit flags in the future)
static inline void mem_cgroup_sk_set_flags(struct sock *sk, unsigned short flags
{
unsigned long val = (unsigned long)sk->sk_memcg;
val &= SK_BPF_MEMCG_PTR_MASK;
val |= flags;
sk->sk_memcg = (struct mem_cgroup *)val;
}
Powered by blists - more mailing lists