netdev - RE: [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <SJ1PR11MB6180F748B6C5AEF5999B3A55B8789@SJ1PR11MB6180.namprd11.prod.outlook.com>
Date: Mon, 15 May 2023 03:30:49 +0000
From: "Zulkifli, Muhammad Husaini" <muhammad.husaini.zulkifli@...el.com>
To: "Fijalkowski, Maciej" <maciej.fijalkowski@...el.com>
CC: Jakub Kicinski <kuba@...nel.org>, "Nguyen, Anthony L"
	<anthony.l.nguyen@...el.com>, "davem@...emloft.net" <davem@...emloft.net>,
	"pabeni@...hat.com" <pabeni@...hat.com>, "edumazet@...gle.com"
	<edumazet@...gle.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
	"Neftin, Sasha" <sasha.neftin@...el.com>, Naama Meir
	<naamax.meir@...ux.intel.com>
Subject: RE: [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring

Hi Maciej,

> On Fri, May 12, 2023 at 08:51:23AM +0000, Zulkifli, Muhammad Husaini
> wrote:
> > Hi Jakub,
> >
> > > On Tue,  9 May 2023 10:09:33 -0700 Tony Nguyen wrote:
> > > > There could be a race condition during link down where interrupt
> > > > being generated and igc_clean_tx_irq() been called to perform the
> > > > TX completion. Properly clear the TX buffer and TX descriptor ring
> > > > to avoid those case.
> > >
> > > > +	/* Zero out the buffer ring */
> > > > +	memset(tx_ring->tx_buffer_info, 0,
> > > > +	       sizeof(*tx_ring->tx_buffer_info) * tx_ring->count);
> > > > +
> > > > +	/* Zero out the descriptor ring */
> > > > +	memset(tx_ring->desc, 0, tx_ring->size);
> > >
> > > Just from the diff and the commit description this does not seem
> > > obviously correct. Race condition means the two functions can run at
> > > the same time, and memset() is not atomic.
> >
> > While a link is going up or down and a lot of packets(UDP) are being
> > sent transmitted, we are observing some kernel panic issues. On my side, it
> was easily to reproduce.
> > It's possible that igc_clean_tx_irq() was called to complete the TX
> > during link up/down based on how the call trace looks. With this fix, I not
> observed the issue anymore.
> 
> then include the splat you were getting in the commit msg as well as steps to
> repro.
> 
> from a brief look it looks like ndo_stop() path does not disable Tx rings before
> cleaning them? This is being done when configuring xsk_pool on a given Tx ring,
> though.

Yes you are right. We shall disable tx queue ring as well. 
I will submit the V2 to replace the igc_clean_tx_ring() to igc_disable_tx_ring() in 
igc_free_tx_resources() during ndo_stop callback while maintaining the memset 
to clear all the ring des/buffer during igc_clean_tx_ring().

I have tested this on my TGL setup with multiple loops iterations and no more
Kernel panic observed. Will update this in commit message as well 

Thanks, 
Husaini

> 
> >
> > Almost similar issue reported before in here:
> >
> https://lore.kernel.org/all/SJ1PR11MB6180CDB866753CFBC2F9AF75B8959@S
> J1
> > PR11MB6180.namprd11.prod.outlook.com/
> >
> > > --
> > > pw-bot: cr
> >