[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87fts231fp.fsf@toke.dk>
Date: Mon, 04 Mar 2019 20:05:30 +0100
From: Toke Høiland-Jørgensen <toke@...hat.com>
To: Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc: David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
Jesper Dangaard Brouer <brouer@...hat.com>,
Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH net-next v3 2/3] xdp: Always use a devmap for XDP_REDIRECT to a device
Jakub Kicinski <jakub.kicinski@...ronome.com> writes:
> On Mon, 04 Mar 2019 12:58:51 +0100, Toke Høiland-Jørgensen wrote:
>> >> diff --git a/include/net/netns/xdp.h b/include/net/netns/xdp.h
>> >> index e5734261ba0a..4935dfe1cf43 100644
>> >> --- a/include/net/netns/xdp.h
>> >> +++ b/include/net/netns/xdp.h
>> >> @@ -4,10 +4,21 @@
>> >>
>> >> #include <linux/rculist.h>
>> >> #include <linux/mutex.h>
>> >> +#include <linux/atomic.h>
>> >> +
>> >> +struct bpf_dtab;
>> >> +
>> >> +struct bpf_dtab_container {
>> >> + struct bpf_dtab __rcu *dtab;
>> >> + atomic_t refcnt;
>> >
>> > refcount_t ? I'm not sure what the rules are for non-performance
>> > critical stuff are..
>>
>> I based the use of atomic_t on what bpf/syscall.c is doing. The problem
>> with using refcount_t is that it's a bit of a weird refcount because
>> it's not strictly tied to the object lifetime (as the objects are only
>> allocated if *both* the globals and the per-namespace refcnt are >0).
>>
>> >> +static void __dev_map_release_default_map(struct bpf_dtab_container *cont)
>> >> +{
>> >> + struct bpf_dtab *dtab = NULL;
>> >> +
>> >> + lockdep_assert_held(&dev_map_lock);
>> >> +
>> >> + dtab = rcu_dereference(cont->dtab);
>> >> + if (dtab) {
>> >
>> > How can it be NULL if it's refcounted? hm..
>>
>> See above; it's not actually the object that is refcounted, it's a count
>> of the number of active XDP programs in the namespace, which still needs
>> to be kept so we can allocate the devmaps when a REDIRECT program is
>> loaded.
>
> I think I got it now. I wonder if anything could be done to improve
> the readability of the dual-refcnting :S Maybe use the "needed"/"used"
> terms throughout?
Yeah, good idea, I'll try something along those lines.
>> >> + list_del_rcu(&dtab->list);
>> >> + rcu_assign_pointer(cont->dtab, NULL);
>> >> + bpf_clear_redirect_map(&dtab->map);
>> >> + call_rcu(&dtab->rcu, __dev_map_free);
>> >> + }
>> >> +}
>> >> +
>> >> +void dev_map_put_default_map(struct net *net)
>> >> +{
>> >> + if (atomic_dec_and_test(&net->xdp.default_map.refcnt)) {
>> >> + spin_lock(&dev_map_lock);
>> >> + __dev_map_release_default_map(&net->xdp.default_map);
>> >> + spin_unlock(&dev_map_lock);
>> >> + }
>> >> +}
>> >> +
>> >> +static int __init_default_map(struct bpf_dtab_container *cont)
>> >> +{
>> >> + struct net *net = bpf_default_map_to_net(cont);
>> >> + struct bpf_dtab *dtab, *old_dtab;
>> >> + int size = DEV_MAP_DEFAULT_SIZE;
>> >> + struct net_device *netdev;
>> >> + union bpf_attr attr = {};
>> >> + u32 idx;
>> >> + int err;
>> >> +
>> >> + lockdep_assert_held(&dev_map_lock);
>> >> +
>> >> + if (!atomic_read(&global_redirect_use))
>> >> + return 0;
>> >> +
>> >> + for_each_netdev(net, netdev)
>> >> + if (netdev->ifindex >= size)
>> >> + size <<= 1;
>> >> +
>> >> + old_dtab = rcu_dereference(cont->dtab);
>> >> + if (old_dtab && old_dtab->map.max_entries == size)
>> >> + return 0;
>> >> +
>> >> + dtab = kzalloc(sizeof(*dtab), GFP_USER);
>> >
>> > You are calling this with a spin lock held :(
>>
>> Yeah, not just this, there are more allocations in dev_map_init_map().
>> Hmm, I guess I need to think about the concurrency bits some more; it is
>> made a bit tricky by the refcounts being outside the objects.
>>
>> Basically what I'm trying to protect against is the case where the
>> global refcount goes to zero and then immediately back to one (or vice
>> versa), which results in a deallocation immediately followed by an
>> allocation. I was worried about the allocation and deallocation racing
>> with each other (because there's a window after the refcnt is checked
>> before the (de)allocation begins). It's not quite clear to me whether
>> RCU is enough to protect against this... Thoughts?
>
> Hm. I think you'll still need a lock (mutex?) on the alloc path, but
> the free path should be fine as long as you load the map pointer before
> looking at the refcnt (atomic op ensuring the barrier there).
Yeah, for the per-namespace refcnt it's pretty straight forward, the
trouble is the global count that needs to iterate over all namespaces;
probably need to put that all behind a (non-spin)lock, right?
>> > Could you please add tests for the replacement combinations?
>>
>> Sure. But, erm, where? selftests? :)
>
> Yes :) selftests/bpf/ Look around at the scripts, AFAIK we don't have
> much in the way of standard infra for system interaction tests like
> this :S
Right. I'll shamelessly copy from the tests already in there and see if
I can't get something decent working :)
-Toke
Powered by blists - more mailing lists