[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190304094452.375ba6e7@cakuba.netronome.com>
Date: Mon, 4 Mar 2019 09:44:52 -0800
From: Jakub Kicinski <jakub.kicinski@...ronome.com>
To: Toke Høiland-Jørgensen <toke@...hat.com>
Cc: David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
Jesper Dangaard Brouer <brouer@...hat.com>,
Daniel Borkmann <daniel@...earbox.net>,
Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH net-next v3 2/3] xdp: Always use a devmap for
XDP_REDIRECT to a device
On Mon, 04 Mar 2019 12:58:51 +0100, Toke Høiland-Jørgensen wrote:
> >> diff --git a/include/net/netns/xdp.h b/include/net/netns/xdp.h
> >> index e5734261ba0a..4935dfe1cf43 100644
> >> --- a/include/net/netns/xdp.h
> >> +++ b/include/net/netns/xdp.h
> >> @@ -4,10 +4,21 @@
> >>
> >> #include <linux/rculist.h>
> >> #include <linux/mutex.h>
> >> +#include <linux/atomic.h>
> >> +
> >> +struct bpf_dtab;
> >> +
> >> +struct bpf_dtab_container {
> >> + struct bpf_dtab __rcu *dtab;
> >> + atomic_t refcnt;
> >
> > refcount_t ? I'm not sure what the rules are for non-performance
> > critical stuff are..
>
> I based the use of atomic_t on what bpf/syscall.c is doing. The problem
> with using refcount_t is that it's a bit of a weird refcount because
> it's not strictly tied to the object lifetime (as the objects are only
> allocated if *both* the globals and the per-namespace refcnt are >0).
>
> >> +static void __dev_map_release_default_map(struct bpf_dtab_container *cont)
> >> +{
> >> + struct bpf_dtab *dtab = NULL;
> >> +
> >> + lockdep_assert_held(&dev_map_lock);
> >> +
> >> + dtab = rcu_dereference(cont->dtab);
> >> + if (dtab) {
> >
> > How can it be NULL if it's refcounted? hm..
>
> See above; it's not actually the object that is refcounted, it's a count
> of the number of active XDP programs in the namespace, which still needs
> to be kept so we can allocate the devmaps when a REDIRECT program is
> loaded.
I think I got it now. I wonder if anything could be done to improve
the readability of the dual-refcnting :S Maybe use the "needed"/"used"
terms throughout?
> >> + list_del_rcu(&dtab->list);
> >> + rcu_assign_pointer(cont->dtab, NULL);
> >> + bpf_clear_redirect_map(&dtab->map);
> >> + call_rcu(&dtab->rcu, __dev_map_free);
> >> + }
> >> +}
> >> +
> >> +void dev_map_put_default_map(struct net *net)
> >> +{
> >> + if (atomic_dec_and_test(&net->xdp.default_map.refcnt)) {
> >> + spin_lock(&dev_map_lock);
> >> + __dev_map_release_default_map(&net->xdp.default_map);
> >> + spin_unlock(&dev_map_lock);
> >> + }
> >> +}
> >> +
> >> +static int __init_default_map(struct bpf_dtab_container *cont)
> >> +{
> >> + struct net *net = bpf_default_map_to_net(cont);
> >> + struct bpf_dtab *dtab, *old_dtab;
> >> + int size = DEV_MAP_DEFAULT_SIZE;
> >> + struct net_device *netdev;
> >> + union bpf_attr attr = {};
> >> + u32 idx;
> >> + int err;
> >> +
> >> + lockdep_assert_held(&dev_map_lock);
> >> +
> >> + if (!atomic_read(&global_redirect_use))
> >> + return 0;
> >> +
> >> + for_each_netdev(net, netdev)
> >> + if (netdev->ifindex >= size)
> >> + size <<= 1;
> >> +
> >> + old_dtab = rcu_dereference(cont->dtab);
> >> + if (old_dtab && old_dtab->map.max_entries == size)
> >> + return 0;
> >> +
> >> + dtab = kzalloc(sizeof(*dtab), GFP_USER);
> >
> > You are calling this with a spin lock held :(
>
> Yeah, not just this, there are more allocations in dev_map_init_map().
> Hmm, I guess I need to think about the concurrency bits some more; it is
> made a bit tricky by the refcounts being outside the objects.
>
> Basically what I'm trying to protect against is the case where the
> global refcount goes to zero and then immediately back to one (or vice
> versa), which results in a deallocation immediately followed by an
> allocation. I was worried about the allocation and deallocation racing
> with each other (because there's a window after the refcnt is checked
> before the (de)allocation begins). It's not quite clear to me whether
> RCU is enough to protect against this... Thoughts?
Hm. I think you'll still need a lock (mutex?) on the alloc path, but
the free path should be fine as long as you load the map pointer before
looking at the refcnt (atomic op ensuring the barrier there).
> > Could you please add tests for the replacement combinations?
>
> Sure. But, erm, where? selftests? :)
Yes :) selftests/bpf/ Look around at the scripts, AFAIK we don't have
much in the way of standard infra for system interaction tests like
this :S
Powered by blists - more mailing lists