netdev - Re: [PATCH v5 bpf-next 2/9] veth: Add driver XDP

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Fri, 27 Jul 2018 13:55:22 +0900
From:   Toshiaki Makita <makita.toshiaki@....ntt.co.jp>
To:     John Fastabend <john.fastabend@...il.com>
Cc:     Toshiaki Makita <toshiaki.makita1@...il.com>,
        netdev@...r.kernel.org, Alexei Starovoitov <ast@...nel.org>,
        Daniel Borkmann <daniel@...earbox.net>,
        Jesper Dangaard Brouer <brouer@...hat.com>,
        Jakub Kicinski <jakub.kicinski@...ronome.com>
Subject: Re: [PATCH v5 bpf-next 2/9] veth: Add driver XDP

Hi John,

On 2018/07/27 12:02, John Fastabend wrote:
> On 07/26/2018 07:40 AM, Toshiaki Makita wrote:
>> From: Toshiaki Makita <makita.toshiaki@....ntt.co.jp>
>>
>> This is the basic implementation of veth driver XDP.
>>
>> Incoming packets are sent from the peer veth device in the form of skb,
>> so this is generally doing the same thing as generic XDP.
>>
>> This itself is not so useful, but a starting point to implement other
>> useful veth XDP features like TX and REDIRECT.
>>
>> This introduces NAPI when XDP is enabled, because XDP is now heavily
>> relies on NAPI context. Use ptr_ring to emulate NIC ring. Tx function
>> enqueues packets to the ring and peer NAPI handler drains the ring.
>>
>> Currently only one ring is allocated for each veth device, so it does
>> not scale on multiqueue env. This can be resolved by allocating rings
>> on the per-queue basis later.
>>
>> Note that NAPI is not used but netif_rx is used when XDP is not loaded,
>> so this does not change the default behaviour.
>>
>> v3:
>> - Fix race on closing the device.
>> - Add extack messages in ndo_bpf.
>>
>> v2:
>> - Squashed with the patch adding NAPI.
>> - Implement adjust_tail.
>> - Don't acquire consumer lock because it is guarded by NAPI.
>> - Make poll_controller noop since it is unnecessary.
>> - Register rxq_info on enabling XDP rather than on opening the device.
>>
>> Signed-off-by: Toshiaki Makita <makita.toshiaki@....ntt.co.jp>
>> ---
> 
> 
> [...]
> 
> One nit and one question.
> 
>> +
>> +static struct sk_buff *veth_xdp_rcv_skb(struct veth_priv *priv,
>> +					struct sk_buff *skb)
>> +{
>> +	u32 pktlen, headroom, act, metalen;
>> +	void *orig_data, *orig_data_end;
>> +	int size, mac_len, delta, off;
>> +	struct bpf_prog *xdp_prog;
>> +	struct xdp_buff xdp;
>> +
>> +	rcu_read_lock();
>> +	xdp_prog = rcu_dereference(priv->xdp_prog);
>> +	if (unlikely(!xdp_prog)) {
>> +		rcu_read_unlock();
>> +		goto out;
>> +	}
>> +
>> +	mac_len = skb->data - skb_mac_header(skb);
>> +	pktlen = skb->len + mac_len;
>> +	size = SKB_DATA_ALIGN(VETH_XDP_HEADROOM + pktlen) +
>> +	       SKB_DATA_ALIGN(sizeof(struct skb_shared_info));
>> +	if (size > PAGE_SIZE)
>> +		goto drop;
> 
> I'm not sure why it matters if size > PAGE_SIZE here. Why not
> just consume it and use the correct page order in alloc_page if
> its not linear.

Indeed. We can allow such skbs here at least if we don't need
reallocation (which is highly unlikely though).

But I'm not sure we should allocate multiple pages in atomic context.
It tends to cause random allocation failure which is IMO more
frustrating. We are now prohibiting such a situation by max_mtu and
dropping features, which looks more robust to me.

>> +
>> +	headroom = skb_headroom(skb) - mac_len;
>> +	if (skb_shared(skb) || skb_head_is_locked(skb) ||
>> +	    skb_is_nonlinear(skb) || headroom < XDP_PACKET_HEADROOM) {
>> +		struct sk_buff *nskb;
>> +		void *head, *start;
>> +		struct page *page;
>> +		int head_off;
>> +
>> +		page = alloc_page(GFP_ATOMIC);
> 
> Should also have __NO_WARN here as well this can be triggered by
> external events so we don't want DDOS here to flood system logs.

Sure, thanks!

-- 
Toshiaki Makita