lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87fts231fp.fsf@toke.dk>
Date:   Mon, 04 Mar 2019 20:05:30 +0100
From:   Toke Høiland-Jørgensen <toke@...hat.com>
To:     Jakub Kicinski <jakub.kicinski@...ronome.com>
Cc:     David Miller <davem@...emloft.net>, netdev@...r.kernel.org,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Daniel Borkmann <daniel@...earbox.net>,
        Alexei Starovoitov <ast@...nel.org>
Subject: Re: [PATCH net-next v3 2/3] xdp: Always use a devmap for XDP_REDIRECT to a device

Jakub Kicinski <jakub.kicinski@...ronome.com> writes:

> On Mon, 04 Mar 2019 12:58:51 +0100, Toke Høiland-Jørgensen wrote:
>> >> diff --git a/include/net/netns/xdp.h b/include/net/netns/xdp.h
>> >> index e5734261ba0a..4935dfe1cf43 100644
>> >> --- a/include/net/netns/xdp.h
>> >> +++ b/include/net/netns/xdp.h
>> >> @@ -4,10 +4,21 @@
>> >>  
>> >>  #include <linux/rculist.h>
>> >>  #include <linux/mutex.h>
>> >> +#include <linux/atomic.h>
>> >> +
>> >> +struct bpf_dtab;
>> >> +
>> >> +struct bpf_dtab_container {
>> >> +	struct bpf_dtab __rcu *dtab;
>> >> +	atomic_t refcnt;  
>> >
>> > refcount_t ?  I'm not sure what the rules are for non-performance
>> > critical stuff are..  
>> 
>> I based the use of atomic_t on what bpf/syscall.c is doing. The problem
>> with using refcount_t is that it's a bit of a weird refcount because
>> it's not strictly tied to the object lifetime (as the objects are only
>> allocated if *both* the globals and the per-namespace refcnt are >0).
>> 
>> >> +static void __dev_map_release_default_map(struct bpf_dtab_container *cont)
>> >> +{
>> >> +	struct bpf_dtab *dtab = NULL;
>> >> +
>> >> +	lockdep_assert_held(&dev_map_lock);
>> >> +
>> >> +	dtab = rcu_dereference(cont->dtab);
>> >> +	if (dtab) {  
>> >
>> > How can it be NULL if it's refcounted? hm..  
>> 
>> See above; it's not actually the object that is refcounted, it's a count
>> of the number of active XDP programs in the namespace, which still needs
>> to be kept so we can allocate the devmaps when a REDIRECT program is
>> loaded.
>
> I think I got it now. I wonder if anything could be done to improve
> the readability of the dual-refcnting :S Maybe use the "needed"/"used"
> terms throughout?

Yeah, good idea, I'll try something along those lines.

>> >> +		list_del_rcu(&dtab->list);
>> >> +		rcu_assign_pointer(cont->dtab, NULL);
>> >> +		bpf_clear_redirect_map(&dtab->map);
>> >> +		call_rcu(&dtab->rcu, __dev_map_free);
>> >> +	}
>> >> +}
>> >> +
>> >> +void dev_map_put_default_map(struct net *net)
>> >> +{
>> >> +	if (atomic_dec_and_test(&net->xdp.default_map.refcnt)) {
>> >> +		spin_lock(&dev_map_lock);
>> >> +		__dev_map_release_default_map(&net->xdp.default_map);
>> >> +		spin_unlock(&dev_map_lock);
>> >> +	}
>> >> +}
>> >> +
>> >> +static int __init_default_map(struct bpf_dtab_container *cont)
>> >> +{
>> >> +	struct net *net = bpf_default_map_to_net(cont);
>> >> +	struct bpf_dtab *dtab, *old_dtab;
>> >> +	int size = DEV_MAP_DEFAULT_SIZE;
>> >> +	struct net_device *netdev;
>> >> +	union bpf_attr attr = {};
>> >> +	u32 idx;
>> >> +	int err;
>> >> +
>> >> +	lockdep_assert_held(&dev_map_lock);
>> >> +
>> >> +	if (!atomic_read(&global_redirect_use))
>> >> +		return 0;
>> >> +
>> >> +	for_each_netdev(net, netdev)
>> >> +		if (netdev->ifindex >= size)
>> >> +			size <<= 1;
>> >> +
>> >> +	old_dtab = rcu_dereference(cont->dtab);
>> >> +	if (old_dtab && old_dtab->map.max_entries == size)
>> >> +		return 0;
>> >> +
>> >> +	dtab = kzalloc(sizeof(*dtab), GFP_USER);  
>> >
>> > You are calling this with a spin lock held :(  
>> 
>> Yeah, not just this, there are more allocations in dev_map_init_map().
>> Hmm, I guess I need to think about the concurrency bits some more; it is
>> made a bit tricky by the refcounts being outside the objects.
>> 
>> Basically what I'm trying to protect against is the case where the
>> global refcount goes to zero and then immediately back to one (or vice
>> versa), which results in a deallocation immediately followed by an
>> allocation. I was worried about the allocation and deallocation racing
>> with each other (because there's a window after the refcnt is checked
>> before the (de)allocation begins). It's not quite clear to me whether
>> RCU is enough to protect against this... Thoughts?
>
> Hm.  I think you'll still need a lock (mutex?) on the alloc path, but
> the free path should be fine as long as you load the map pointer before
> looking at the refcnt (atomic op ensuring the barrier there).

Yeah, for the per-namespace refcnt it's pretty straight forward, the
trouble is the global count that needs to iterate over all namespaces;
probably need to put that all behind a (non-spin)lock, right?

>> > Could you please add tests for the replacement combinations?  
>> 
>> Sure. But, erm, where? selftests? :)
>
> Yes :)  selftests/bpf/  Look around at the scripts, AFAIK we don't have
> much in the way of standard infra for system interaction tests like
> this :S

Right. I'll shamelessly copy from the tests already in there and see if
I can't get something decent working :)

-Toke

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ