lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 1 Mar 2024 11:45:52 +0000
From: wangyunjian <wangyunjian@...wei.com>
To: Paolo Abeni <pabeni@...hat.com>, "mst@...hat.com" <mst@...hat.com>,
	"willemdebruijn.kernel@...il.com" <willemdebruijn.kernel@...il.com>,
	"jasowang@...hat.com" <jasowang@...hat.com>, "kuba@...nel.org"
	<kuba@...nel.org>, "bjorn@...nel.org" <bjorn@...nel.org>,
	"magnus.karlsson@...el.com" <magnus.karlsson@...el.com>,
	"maciej.fijalkowski@...el.com" <maciej.fijalkowski@...el.com>,
	"jonathan.lemon@...il.com" <jonathan.lemon@...il.com>, "davem@...emloft.net"
	<davem@...emloft.net>
CC: "bpf@...r.kernel.org" <bpf@...r.kernel.org>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>, "kvm@...r.kernel.org" <kvm@...r.kernel.org>,
	"virtualization@...ts.linux.dev" <virtualization@...ts.linux.dev>, xudingke
	<xudingke@...wei.com>, "liwei (DT)" <liwei395@...wei.com>
Subject: RE: [PATCH net-next v2 3/3] tun: AF_XDP Tx zero-copy support

> -----Original Message-----
> From: Paolo Abeni [mailto:pabeni@...hat.com]
> Sent: Thursday, February 29, 2024 7:13 PM
> To: wangyunjian <wangyunjian@...wei.com>; mst@...hat.com;
> willemdebruijn.kernel@...il.com; jasowang@...hat.com; kuba@...nel.org;
> bjorn@...nel.org; magnus.karlsson@...el.com; maciej.fijalkowski@...el.com;
> jonathan.lemon@...il.com; davem@...emloft.net
> Cc: bpf@...r.kernel.org; netdev@...r.kernel.org;
> linux-kernel@...r.kernel.org; kvm@...r.kernel.org;
> virtualization@...ts.linux.dev; xudingke <xudingke@...wei.com>; liwei (DT)
> <liwei395@...wei.com>
> Subject: Re: [PATCH net-next v2 3/3] tun: AF_XDP Tx zero-copy support
> 
> On Wed, 2024-02-28 at 19:05 +0800, Yunjian Wang wrote:
> > @@ -2661,6 +2776,54 @@ static int tun_ptr_peek_len(void *ptr)
> >  	}
> >  }
> >
> > +static void tun_peek_xsk(struct tun_file *tfile) {
> > +	struct xsk_buff_pool *pool;
> > +	u32 i, batch, budget;
> > +	void *frame;
> > +
> > +	if (!ptr_ring_empty(&tfile->tx_ring))
> > +		return;
> > +
> > +	spin_lock(&tfile->pool_lock);
> > +	pool = tfile->xsk_pool;
> > +	if (!pool) {
> > +		spin_unlock(&tfile->pool_lock);
> > +		return;
> > +	}
> > +
> > +	if (tfile->nb_descs) {
> > +		xsk_tx_completed(pool, tfile->nb_descs);
> > +		if (xsk_uses_need_wakeup(pool))
> > +			xsk_set_tx_need_wakeup(pool);
> > +	}
> > +
> > +	spin_lock(&tfile->tx_ring.producer_lock);
> > +	budget = min_t(u32, tfile->tx_ring.size, TUN_XDP_BATCH);
> > +
> > +	batch = xsk_tx_peek_release_desc_batch(pool, budget);
> > +	if (!batch) {
> 
> This branch looks like an unneeded "optimization". The generic loop below
> should have the same effect with no measurable perf delta - and smaller code.
> Just remove this.
> 
> > +		tfile->nb_descs = 0;
> > +		spin_unlock(&tfile->tx_ring.producer_lock);
> > +		spin_unlock(&tfile->pool_lock);
> > +		return;
> > +	}
> > +
> > +	tfile->nb_descs = batch;
> > +	for (i = 0; i < batch; i++) {
> > +		/* Encode the XDP DESC flag into lowest bit for consumer to differ
> > +		 * XDP desc from XDP buffer and sk_buff.
> > +		 */
> > +		frame = tun_xdp_desc_to_ptr(&pool->tx_descs[i]);
> > +		/* The budget must be less than or equal to tx_ring.size,
> > +		 * so enqueuing will not fail.
> > +		 */
> > +		__ptr_ring_produce(&tfile->tx_ring, frame);
> > +	}
> > +	spin_unlock(&tfile->tx_ring.producer_lock);
> > +	spin_unlock(&tfile->pool_lock);
> 
> More related to the general design: it looks wrong. What if
> get_rx_bufs() will fail (ENOBUF) after successful peeking? With no more
> incoming packets, later peek will return 0 and it looks like that the
> half-processed packets will stay in the ring forever???
> 
> I think the 'ring produce' part should be moved into tun_do_read().

Currently, the vhost-net obtains a batch descriptors/sk_buffs from the
ptr_ring and enqueue the batch descriptors/sk_buffs to the virtqueue'queue,
and then consumes the descriptors/sk_buffs from the virtqueue'queue in
sequence. As a result, TUN does not know whether the batch descriptors have
been used up, and thus does not know when to return the batch descriptors.

So, I think it's reasonable that when vhost-net checks ptr_ring is empty,
it calls peek_len to get new xsk's descs and return the descriptors.

Thanks
> 
> Cheers,
> 
> Paolo

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ