[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZF5bkhhs+Ue+DZfG@boxer>
Date: Fri, 12 May 2023 17:30:26 +0200
From: Maciej Fijalkowski <maciej.fijalkowski@...el.com>
To: "Zulkifli, Muhammad Husaini" <muhammad.husaini.zulkifli@...el.com>
CC: Jakub Kicinski <kuba@...nel.org>, "Nguyen, Anthony L"
<anthony.l.nguyen@...el.com>, "davem@...emloft.net" <davem@...emloft.net>,
"pabeni@...hat.com" <pabeni@...hat.com>, "edumazet@...gle.com"
<edumazet@...gle.com>, "netdev@...r.kernel.org" <netdev@...r.kernel.org>,
"Neftin, Sasha" <sasha.neftin@...el.com>, Naama Meir
<naamax.meir@...ux.intel.com>
Subject: Re: [PATCH net 1/3] igc: Clean the TX buffer and TX descriptor ring
On Fri, May 12, 2023 at 08:51:23AM +0000, Zulkifli, Muhammad Husaini wrote:
> Hi Jakub,
>
> > On Tue, 9 May 2023 10:09:33 -0700 Tony Nguyen wrote:
> > > There could be a race condition during link down where interrupt being
> > > generated and igc_clean_tx_irq() been called to perform the TX
> > > completion. Properly clear the TX buffer and TX descriptor ring to
> > > avoid those case.
> >
> > > + /* Zero out the buffer ring */
> > > + memset(tx_ring->tx_buffer_info, 0,
> > > + sizeof(*tx_ring->tx_buffer_info) * tx_ring->count);
> > > +
> > > + /* Zero out the descriptor ring */
> > > + memset(tx_ring->desc, 0, tx_ring->size);
> >
> > Just from the diff and the commit description this does not seem obviously
> > correct. Race condition means the two functions can run at the same time,
> > and memset() is not atomic.
>
> While a link is going up or down and a lot of packets(UDP) are being sent transmitted,
> we are observing some kernel panic issues. On my side, it was easily to reproduce.
> It's possible that igc_clean_tx_irq() was called to complete the TX during link up/down
> based on how the call trace looks. With this fix, I not observed the issue anymore.
then include the splat you were getting in the commit msg as well as steps
to repro.
from a brief look it looks like ndo_stop() path does not disable Tx rings
before cleaning them? This is being done when configuring xsk_pool on a
given Tx ring, though.
>
> Almost similar issue reported before in here:
> https://lore.kernel.org/all/SJ1PR11MB6180CDB866753CFBC2F9AF75B8959@SJ1PR11MB6180.namprd11.prod.outlook.com/
>
> > --
> > pw-bot: cr
>
Powered by blists - more mailing lists