netdev - RE: [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <SJ1PR11MB6180BBD70342998B2C639472B8759@SJ1PR11MB6180.namprd11.prod.outlook.com>
Date: Fri, 12 May 2023 08:51:23 +0000
From: "Zulkifli, Muhammad Husaini" <muhammad.husaini.zulkifli@...el.com>
To: Jakub Kicinski <kuba@...nel.org>, "Nguyen, Anthony L"
	<anthony.l.nguyen@...el.com>
CC: "davem@...emloft.net" <davem@...emloft.net>, "pabeni@...hat.com"
	<pabeni@...hat.com>, "edumazet@...gle.com" <edumazet@...gle.com>,
	"netdev@...r.kernel.org" <netdev@...r.kernel.org>, "Neftin, Sasha"
	<sasha.neftin@...el.com>, Naama Meir <naamax.meir@...ux.intel.com>
Subject: RE: [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring

Hi Jakub,

> On Tue,  9 May 2023 10:09:33 -0700 Tony Nguyen wrote:
> > There could be a race condition during link down where interrupt being
> > generated and igc_clean_tx_irq() been called to perform the TX
> > completion. Properly clear the TX buffer and TX descriptor ring to
> > avoid those case.
> 
> > +	/* Zero out the buffer ring */
> > +	memset(tx_ring->tx_buffer_info, 0,
> > +	       sizeof(*tx_ring->tx_buffer_info) * tx_ring->count);
> > +
> > +	/* Zero out the descriptor ring */
> > +	memset(tx_ring->desc, 0, tx_ring->size);
> 
> Just from the diff and the commit description this does not seem obviously
> correct. Race condition means the two functions can run at the same time,
> and memset() is not atomic.

While a link is going up or down and a lot of packets(UDP) are being sent transmitted, 
we are observing some kernel panic issues. On my side, it was easily to reproduce.
It's possible that igc_clean_tx_irq() was called to complete the TX during link up/down 
based on how the call trace looks. With this fix, I not observed the issue anymore.

Almost similar issue reported before in here:
https://lore.kernel.org/all/SJ1PR11MB6180CDB866753CFBC2F9AF75B8959@SJ1PR11MB6180.namprd11.prod.outlook.com/

> --
> pw-bot: cr