lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9631475a8ba94c1682696d219c632538@AcuMS.aculab.com>
Date: Mon, 30 Oct 2023 15:29:45 +0000
From: David Laight <David.Laight@...LAB.COM>
To: 'Shinas Rasheed' <srasheed@...vell.com>, "netdev@...r.kernel.org"
	<netdev@...r.kernel.org>, "linux-kernel@...r.kernel.org"
	<linux-kernel@...r.kernel.org>
CC: Haseeb Gani <hgani@...vell.com>, Vimlesh Kumar <vimleshk@...vell.com>,
	"egallen@...hat.com" <egallen@...hat.com>, "mschmidt@...hat.com"
	<mschmidt@...hat.com>, "pabeni@...hat.com" <pabeni@...hat.com>,
	"horms@...nel.org" <horms@...nel.org>, "kuba@...nel.org" <kuba@...nel.org>,
	"davem@...emloft.net" <davem@...emloft.net>, "wizhao@...hat.com"
	<wizhao@...hat.com>, "konguyen@...hat.com" <konguyen@...hat.com>,
	Veerasenareddy Burru <vburru@...vell.com>, Sathesh B Edara
	<sedara@...vell.com>, Eric Dumazet <edumazet@...gle.com>
Subject: RE: [PATCH net-next v2 3/4] octeon_ep: implement xmit_more in
 transmit

From: Shinas Rasheed <srasheed@...vell.com>
> Sent: 30 October 2023 14:15
> 
> Hi,
> 
> I understand the window is closed, but just replying to a pending comment on the thread.
> 
> > -----Original Message-----
> > From: David Laight <David.Laight@...LAB.COM>
> > ----------------------------------------------------------------------
> > From: Shinas Rasheed
> > > Sent: 24 October 2023 15:51
> > >
> > > Add xmit_more handling in tx datapath for octeon_ep pf.
> > >
> > ...
> > > -
> > > -	/* Ring Doorbell to notify the NIC there is a new packet */
> > > -	writel(1, iq->doorbell_reg);
> > > -	iq->stats.instr_posted++;
> > > +	/* Ring Doorbell to notify the NIC of new packets */
> > > +	writel(iq->fill_cnt, iq->doorbell_reg);
> > > +	iq->stats.instr_posted += iq->fill_cnt;
> > > +	iq->fill_cnt = 0;
> > >  	return NETDEV_TX_OK;
> >
> > Does that really need the count?
> > A 'doorbell' register usually just tells the MAC engine
> > to go and look at the transmit ring.
> > It then continues to process transmits until it fails
> > to find a packet.
> > So if the transmit is active you don't need to set the bit.
> > (Although that is actually rather hard to detect.)
> 
> The way the octeon hardware works is that it expects number of newly updated packets
> to be written to the doorbell register,which effectively increments the doorbell
> count which shall be decremented by hardware as it reads these packets. So in essence,
> the doorbell count also indicates outstanding packets to be read by hardware.

Unusual - I wouldn't call that a doorbell register.

> > The 'xmit_more' flag is useful if (the equivalent of) writing
> > the doorbell register is expensive since it can be delayed
> > to a later frame and only done once - adding a slight latency
> > to the earlier transmits if the mac engine was idle.
> >
> > I'm not sure how much (if any) performance gain you actually
> > get from avoiding the writel().
> > Single PCIe writes are 'posted' and pretty much completely
> > asynchronous.
> 
> Can you elaborate what you are suggesting here to do? The driver is trying
> to make use of the 'xmit_more' hint from the network stack, as any network
> driver might opt to do.

There are some drivers where waking up the MAC engine is expensive.
If you need to do a PCIe read then they are expensive.
There might also be drivers that need to send a USB message.
I don't actually know which one it was added for.

> I think avoiding continuous PCIe posts for each packet shall still be wasteful
> as the hardware can bulk read from the queue if we give it a batch of packets.

If you do writes for every packet then the hardware can get on with
sending the first packet and might be able to do bulk reads
for the next packet(s) when that finishes.

The extra code you are adding could easily (waving hands)
be more expensive than the posted PCIe write.
(Especially if you have to add an atomic operation.)

Unless, of course, you have to wait for it to send that batch
of packets before you can give it any more.
Which would be rather entirely broken and would really require
you do the write in the end-of-transit path.

> > The other problem I've seen is that netdev_xmit_more() is
> > the state of the queue when the transmit was started, not
> > the current state.
> > If a packet is added while the earlier transmit setup code
> > is running (setting up the descriptors etc) the it isn't set.
> > So the fast path doesn't get taken.
> 
> By the next packet the kernel sends, the xmit_more should be set
> as far I understand, right? (as the xmit_more bool is set if skb->next
> is present, if the transmit path follows dev_hard_start_xmit).

The loop is something like:
	while (get_packet()) {
		per_cpu->xmit_more = !queue_empty();
		if (transmit_packet() != TX_OK)
			break;
	}
So if a packet is added while all the transmit setup code is running
it isn't detected.
I managed to repeatedly get that to loop when xmit_more wasn't set
and in a driver where the 'doorbell' write wasn't entirely trivial.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ