[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAP01T77QYNc6BJP+OtVV3YQGgS06ZeWTUBdq3sp2FhKmeoo6-A@mail.gmail.com>
Date: Tue, 11 Oct 2022 08:10:26 +0530
From: Kumar Kartikeya Dwivedi <memxor@...il.com>
To: David Vernet <void@...ifault.com>
Cc: ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
martin.lau@...ux.dev, kernel-team@...com, bpf@...r.kernel.org,
linux-kernel@...r.kernel.org, yhs@...com, song@...nel.org,
john.fastabend@...il.com, kpsingh@...nel.org, sdf@...gle.com,
haoluo@...gle.com, jolsa@...nel.org, tj@...nel.org
Subject: Re: [PATCH v2] selftests/bpf: Update map_kptr examples to reflect
real use-cases
On Tue, 4 Oct 2022 at 20:59, David Vernet <void@...ifault.com> wrote:
>
> On Mon, Oct 03, 2022 at 06:22:55PM +0200, Kumar Kartikeya Dwivedi wrote:
>
> [...]
>
> > > > > noinline void bpf_kfunc_call_test_release(struct prog_test_ref_kfunc *p)
> > > > > {
> > > > > if (!p)
> > > > > return;
> > > > >
> > > > > - refcount_dec(&p->cnt);
> > > > > + WARN_ON_ONCE(atomic_read(&p->destroyed));
> > > > > + if (refcount_dec_and_test(&p->cnt))
> > > > > + call_rcu(&p->rcu, delayed_destroy_test_ref_struct);
> > > >
> > > > I wonder whether this is ever called, I haven't really given this
> > > > patch a shot, but I don't see how the refcount can ever drop back to
> > > > zero. It's initialized as 1, and then only acquired after that, so
> > > > pairing all releases should still preserve refcount as 1.
> > >
> > > Yeah, the call_rcu() path is never hit. If we wanted to update the test
> > > so that this codepath was actually exercised, I think we'd need to add
> > > another kfunc that returned a reference to a dynamically allocated
> > > object rather than using the global, static one. I'm happy to do that if
> > > we think it's useful. The downside to adding more of these test kfuncs
> > > is that they actually do add a small bit of runtime overhead to the
> > > kernel because they're unconditionally registered in the __init function
> > > for test_run.c.
> > >
> >
> > But that only happens once, right? It still happens, so I don't see
> > what changes.
>
> The idea here would be to return a dynamically allocated object with an
> initial refcount of 1 that's owned by the _BPF program_, rather than
> what we have today where the global struct has an initial refcount
> that's owned by the main kernel and is never expected to go to 0. For
> all success (i.e. non-fail) testcases that are able to dynamically
> allocate this object, the refcount should go to 0 for each of them and
> the object will be destroyed after the current RCU grace period. Please
> let me know if I've misunderstood your point.
>
> > > > Also, even if you made it work, wouldn't you have the warning once you
> > > > run more selftests using prog_test_run, if you just set the destroyed
> > > > bit on each test run?
> > >
> > > If we want to update the test to have the refcount drop to 0, we would
> > > probably have to instead use dynamically allocated objects. At that
> > > point, we'd probably just crash instead of seeing a warning if we
> > > accidentally let a caller invoke acquire or release after the object had
> > > been destroyed. Maybe the better thing to do here is to just warn
> > > unconditionally in the destructor rather than setting a flag? What we
> > > really want to ensure is that the final refcount that's "owned" by the
> > > main kernel is never dropped.
> >
> > I think the refcount_t API already warns if underflow happens.
>
> Right, a warning would probably show up if we did a release that caused
> an underflow, but it would not for an acquire after the refcount dropped
> to 0.
>
It should, see REFCOUNT_ADD_UAF in include/linux/refcount.h.
> > To be clear, I don't mind if you want to improve this, it's certainly
> > a mess right now. Tests can't even run in parallel easily because it's
> > global. Testing like an actually allocated object might be a good way
> > to simulate a real scenario. And I totally agree that having a real
> > example is useful to people who want to add support for more kptrs.
>
> Ok, let me update the tests to do two things then:
>
> 1. Add a new test kfunc called bpf_kfunc_call_test_alloc() which returns
> a dynamically allocated instance of a prog_test_ref_kfunc *. This is
> similar in intention to bpf_xdp_ct_alloc().
> 2. Update bpf_kfunc_call_test_acquire() and
> bpf_kfunc_call_test_release() to take a trusted pointer to that
> allocated prog_test_ref_kfunc *.
This should work, but you would have to go through a lot of tests,
sadly I assumed it is global in a lot of places to make testing easier
(e.g. observing count after releasing by doing p->next, which gave me
a PTR_TO_BTF_ID that is preserved after release).
Some other way would have to be found to do the same thing.
> 3. Keep the other changes which e.g. use RCU in
> bpf_kfunc_call_test_kptr_get() to synchronize on getting the kptr.
> Once the __rcu kptr variant is landed we can get rid of
> bpf_kfunc_call_test_kptr_get() and make bpf_kfunc_call_test_acquire()
> require an __rcu pointer.
>
In the case of RCU I don't plan on passing the rcu pointer to acquire
functions, but rather kptr_get stops taking pointer to pointer. I.e.
in your point 3, kptr_get does what you want _acquire to do. It takes
an rcu protected pointer to an object and safely increments its count.
> Continuing on point (3) above, we should _probably_ also have an example
> for an object that isn't RCU-protected? I imagine that we don't want to
> get rid of KF_KPTR_GET entirely once we have __rcu pointers because some
> kptr_get() implementations may synchronize in other ways, such as with
> spinlocks or whatever. Let's leave this until after __rcu lands though.
>
I think it's unlikely kptr_get can work without it, spinlocks may be
required internally (e.g. to inspect the object key/generation in
SLAB_TYPESAFE_BY_RCU case without races) but that goes on top of RCU
protection. But yes, it depends, maybe it will work for some special
cases. Though I don't think it's worth adding an example for the
uncommon case.
So you will still need kptr_get helpers, some of them simply do
refcount_inc_not_zero, others may do a little more.
Anyway, let's discuss more when the set is posted.
> Does this all sound good?
>
Yes.
Powered by blists - more mailing lists