lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 4 Mar 2019 09:44:52 -0800
From:   Jakub Kicinski <jakub.kicinski@...ronome.com>
To:     Toke Høiland-Jørgensen <toke@...hat.com>
Cc:     David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH net-next v3 2/3] xdp: Always use a devmap for
 XDP_REDIRECT to a device

On Mon, 04 Mar 2019 12:58:51 +0100, Toke Høiland-Jørgensen wrote:
> >> diff --git a/include/net/netns/xdp.h b/include/net/netns/xdp.h
> >> index e5734261ba0a..4935dfe1cf43 100644
> >> --- a/include/net/netns/xdp.h
> >> +++ b/include/net/netns/xdp.h
> >> @@ -4,10 +4,21 @@
> >>  
> >>  #include <linux/rculist.h>
> >>  #include <linux/mutex.h>
> >> +#include <linux/atomic.h>
> >> +
> >> +struct bpf_dtab;
> >> +
> >> +struct bpf_dtab_container {
> >> +	struct bpf_dtab __rcu *dtab;
> >> +	atomic_t refcnt;  
> >
> > refcount_t ?  I'm not sure what the rules are for non-performance
> > critical stuff are..  
> 
> I based the use of atomic_t on what bpf/syscall.c is doing. The problem
> with using refcount_t is that it's a bit of a weird refcount because
> it's not strictly tied to the object lifetime (as the objects are only
> allocated if *both* the globals and the per-namespace refcnt are >0).
> 
> >> +static void __dev_map_release_default_map(struct bpf_dtab_container *cont)
> >> +{
> >> +	struct bpf_dtab *dtab = NULL;
> >> +
> >> +	lockdep_assert_held(&dev_map_lock);
> >> +
> >> +	dtab = rcu_dereference(cont->dtab);
> >> +	if (dtab) {  
> >
> > How can it be NULL if it's refcounted? hm..  
> 
> See above; it's not actually the object that is refcounted, it's a count
> of the number of active XDP programs in the namespace, which still needs
> to be kept so we can allocate the devmaps when a REDIRECT program is
> loaded.

I think I got it now.  I wonder if anything could be done to improve
the readability of the dual-refcnting :S  Maybe use the "needed"/"used"
terms throughout?

> >> +		list_del_rcu(&dtab->list);
> >> +		rcu_assign_pointer(cont->dtab, NULL);
> >> +		bpf_clear_redirect_map(&dtab->map);
> >> +		call_rcu(&dtab->rcu, __dev_map_free);
> >> +	}
> >> +}
> >> +
> >> +void dev_map_put_default_map(struct net *net)
> >> +{
> >> +	if (atomic_dec_and_test(&net->xdp.default_map.refcnt)) {
> >> +		spin_lock(&dev_map_lock);
> >> +		__dev_map_release_default_map(&net->xdp.default_map);
> >> +		spin_unlock(&dev_map_lock);
> >> +	}
> >> +}
> >> +
> >> +static int __init_default_map(struct bpf_dtab_container *cont)
> >> +{
> >> +	struct net *net = bpf_default_map_to_net(cont);
> >> +	struct bpf_dtab *dtab, *old_dtab;
> >> +	int size = DEV_MAP_DEFAULT_SIZE;
> >> +	struct net_device *netdev;
> >> +	union bpf_attr attr = {};
> >> +	u32 idx;
> >> +	int err;
> >> +
> >> +	lockdep_assert_held(&dev_map_lock);
> >> +
> >> +	if (!atomic_read(&global_redirect_use))
> >> +		return 0;
> >> +
> >> +	for_each_netdev(net, netdev)
> >> +		if (netdev->ifindex >= size)
> >> +			size <<= 1;
> >> +
> >> +	old_dtab = rcu_dereference(cont->dtab);
> >> +	if (old_dtab && old_dtab->map.max_entries == size)
> >> +		return 0;
> >> +
> >> +	dtab = kzalloc(sizeof(*dtab), GFP_USER);  
> >
> > You are calling this with a spin lock held :(  
> 
> Yeah, not just this, there are more allocations in dev_map_init_map().
> Hmm, I guess I need to think about the concurrency bits some more; it is
> made a bit tricky by the refcounts being outside the objects.
> 
> Basically what I'm trying to protect against is the case where the
> global refcount goes to zero and then immediately back to one (or vice
> versa), which results in a deallocation immediately followed by an
> allocation. I was worried about the allocation and deallocation racing
> with each other (because there's a window after the refcnt is checked
> before the (de)allocation begins). It's not quite clear to me whether
> RCU is enough to protect against this... Thoughts?

Hm.  I think you'll still need a lock (mutex?) on the alloc path, but
the free path should be fine as long as you load the map pointer before
looking at the refcnt (atomic op ensuring the barrier there).

> > Could you please add tests for the replacement combinations?  
> 
> Sure. But, erm, where? selftests? :)

Yes :)  selftests/bpf/  Look around at the scripts, AFAIK we don't have
much in the way of standard infra for system interaction tests like
this :S

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ