[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20230215104837.gm3ohqzownrtky5k@apollo>
Date: Wed, 15 Feb 2023 11:48:37 +0100
From: Kumar Kartikeya Dwivedi <memxor@...il.com>
To: Alexei Starovoitov <alexei.starovoitov@...il.com>
Cc: davem@...emloft.net, daniel@...earbox.net, andrii@...nel.org,
martin.lau@...nel.org, void@...ifault.com, davemarchevsky@...a.com,
tj@...nel.org, netdev@...r.kernel.org, bpf@...r.kernel.org,
kernel-team@...com
Subject: Re: [PATCH bpf-next 2/4] bpf: Introduce kptr_rcu.
On Wed, Feb 15, 2023 at 07:58:10AM CET, Alexei Starovoitov wrote:
> From: Alexei Starovoitov <ast@...nel.org>
>
> The life time of certain kernel structures like 'struct cgroup' is protected by RCU.
> Hence it's safe to dereference them directly from kptr-s in bpf maps.
> The resulting pointer is PTR_TRUSTED and can be passed to kfuncs that expect KF_TRUSTED_ARGS.
But I thought PTR_TRUSTED was meant to ensure that the refcount is always > 0?
E.g. in [0] you said that kernel code should ensure refcount is held while
passing trusted pointer as tracepoint args. It's also clear from what functions
that operate on PTR_TRUSTED are doing. bpf_cgroup_acquire is doing css_get and
not css_tryget. Similarly bpf_cgroup_ancestor also calls cgroup_get which does
css_get instead of css_tryget.
[0]: https://lore.kernel.org/bpf/CAADnVQJfj9mrFZ+mBfwh8Xba333B6EyHRMdb6DE4s6te_5_V_A@mail.gmail.com
And if we want to do RCU + css_tryget, we already have that in the form of
kptr_get.
I think we've had a similar discussion about this in
https://lore.kernel.org/bpf/20220216214405.tn7thpnvqkxuvwhd@ast-mbp.dhcp.thefacebook.com,
where you advised against directly assuming pointers to RCU protected objects as
trusted where refcount could drop to 0. So then we went the kptr_get route,
because explicit RCU sections weren't available back then to load + inc_not_zero
directly (for sleepable programs).
> Derefrence of other kptr-s returns PTR_UNTRUSTED.
>
> For example:
> struct map_value {
> struct cgroup __kptr_rcu *cgrp;
> };
>
> SEC("tp_btf/cgroup_mkdir")
> int BPF_PROG(test_cgrp_get_ancestors, struct cgroup *cgrp_arg, const char *path)
> {
> struct cgroup *cg, *cg2;
>
> cg = bpf_cgroup_acquire(cgrp_arg); // cg is PTR_TRUSTED and ref_obj_id > 0
> bpf_kptr_xchg(&v->cgrp, cg);
>
> cg2 = v->cgrp; // cg2 is PTR_TRUSTED | MEM_RCU. This is new feature introduced by this patch.
>
> bpf_cgroup_ancestor(cg2, level); // safe to do. cg2 will not disappear
^^ But it's percpu_ref
can drop to zero, right?
> }
>
> Signed-off-by: Alexei Starovoitov <ast@...nel.org>
> ---
> [...]
Powered by blists - more mailing lists