[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <908c6a63-3049-4dd2-859a-215b31e5d1ea@linux.dev>
Date: Fri, 7 Mar 2025 22:17:44 -0800
From: Yonghong Song <yonghong.song@...ux.dev>
To: Martin KaFai Lau <martin.lau@...ux.dev>, Mahe Tardy <mahe.tardy@...il.com>
Cc: daniel@...earbox.net, john.fastabend@...il.com, ast@...nel.org,
andrii@...nel.org, jolsa@...nel.org, bpf@...r.kernel.org,
Network Development <netdev@...r.kernel.org>
Subject: Re: [PATCH bpf-next 1/2] bpf: add get_netns_cookie helper to tracing
programs
On 3/7/25 3:06 PM, Martin KaFai Lau wrote:
> On 3/6/25 9:03 AM, Mahe Tardy wrote:
>>>>> The immediate question is whether sock_net(sk) must be non-NULL
>>>>> for tracing.
>>>> We discussed this offline with Daniel Borkmann and we think that it
>>>> might not be the question. The get_netns_cookie(NULL) call allows
>>>> us to
>>>> compare against get_netns_cookie(sock) to see whether the sock's netns
>>>> is equal to the init netns and thus dispatch different logic.
>>> bpf_get_netns_cookie(NULL) should be fine.
>>>
>>> I meant to ask if sock_net(sk) may return NULL for a non NULL sk.
>>> Please check.
>> Oh sorry for the confusion, I investigated with my humble kernel
>> knowledge: essentially sock_net(sk) is doing sk->sk_net->net, retrieving
>> the net struct representing the network namespace, to later extract the
>> cookie, and thus dereference the returned pointer (here is the concern).
>> The sk_net intermediary (in reality __sk_common.skc_net) is here because
>> of the possibility of switching on/off network namespaces via
>> CONFIG_NET_NS. It's a possible_net_t type containing (or not) the struct
>> net pointer, explaining why we use write/read_pnet to no-op or return
>> the global net ns.
>>
>> Now by adding this helper to tracing progs, it allows to call this
>> function in any function entry or function exit, but unlike kprobes,
>> it's not possible to just hook at an obvious arbitrary point in the code
>> where the net ns would be NULL in the sock struct. With that in mind, I
>> failed to crash the kernel tracing a function (some candidates were
>> inlined). I mostly grepped for sock_net_set, but I lack the knowledge to
>
> Thanks for checking.
>
> I took a quick look at the callers of sock_net_set. I suspect
> "fentry/sk_prot_alloc" and "lsm/sk_alloc" could have a NULL?
>
>> guarantee that this could not happen right now or in the future. Maybe
>> that would be just safer to add a check and return 0 in that case if
>> that's ok? Not sure since the helper returns an 8-byte long opaque
>> number which thus includes 0 as a valid value.
>
> I assume net_cookie 0 is invalid, but then it leaks the implementation
> details of what is a valid cookie in a uapi helper
>
> * u64 bpf_get_netns_cookie(void *ctx)
> * ...
> * Return
> * A 8-byte long opaque number
>
> Note that, the tracing program can already read most fields of the sk,
> including sk->sk_net.net->net_cookie. Therefore, what this patch aims
> to achieve has already been supported in tracing. It can also save a
> helper call.
>
> The only thing that may be missing in your use case is determining the
> init_net. I don't think reading a global kernel variable has been
> supported yet. Not sure if init_net must have net_cookie 1. Otherwise,
> we could consider to add a kfunc to return &init_net, which could be
> used to compare with sk->sk_net.net. Having a pointer to &init_net
> might be more useful for other tracing use cases in general.
There is the workaround for this tracing use case.
1. Declare a global variable in the bpf program, e.g.
struct net *init_net;
2. After skel_open and before skel_load, find init_net address (from /proc/kallsyms) and
assign the address to skel->bss->init_net.
3. In the prog, do
struct net *netns = bpf_rdonly_cast(init_net, bpf_core_type_id_kernel(struct net));
bpf_printk("%u\n", netns->net_cookie);
There is an effort to add global variables to BTF.
See https://lore.kernel.org/bpf/20250207012045.2129841-1-stephen.s.brennan@oracle.com/
The recommended way is to put these global variables in a module to avoid consume
too much kernel memory unconditionally.
Powered by blists - more mailing lists