lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [day] [month] [year] [list]
Message-ID: <CAKKbWA6zRee9Rzee-ebLnEAvwLqnmsPswGaUo_ineyzw-b=EgQ@mail.gmail.com>
Date: Sun, 3 Nov 2024 21:00:54 +0200
From: Avi Fishman <avifishman70@...il.com>
To: cathycai0714@...il.com, Alexandre Torgue <alexandre.torgue@...s.st.com>, 
	Jose Abreu <joabreu@...opsys.com>, Giuseppe Cavallaro <peppe.cavallaro@...com>
Cc: cathy.cai@...soc.com, cixi.geng1@...soc.com, 
	David Miller <davem@...emloft.net>, edumazet@...gle.com, kuba@...nel.org, 
	Linux ARM <linux-arm-kernel@...ts.infradead.org>, 
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>, linux-stm32@...md-mailman.stormreply.com, 
	mcoquelin.stm32@...il.com, Network Development <netdev@...r.kernel.org>, pabeni@...hat.com, 
	romain.gantois@...tlin.com, wade.shu@...soc.com, xuewen.yan94@...il.com, 
	zhiguo.niu@...soc.com, Alexandre Torgue <alexandre.torgue@...com>, 
	Murali <murali.somarouthu@...l.com>, Tomer Maimon <tmaimon77@...il.com>, 
	"Silva, L Antonio" <Luis.A.Silva@...l.com>, Arias Pablo <Pablo_Arias@...l.com>, 
	Somarouthu Murali <Murali_Somarouthu@...l.com>, uri.trichter@...oton.com
Subject: Re: [RFC PATCH] net: stmmac: Fix the problem about interrupt storm

Hi all,

We recently encountered the same interrupt storm and the root cause
was the same as here.
The suggested patch solved 99% of the issues, but indeed as written
below on rare cases the issue happens between the dev_open() and
clear_bit(STMMAC_DOWN) calls.
I also agree that stmmac_interrupt() unconditionally ignores
interrupts when the driver is in STMMAC_DOWN state is dangerous.

The issue happened for us in linux 5.10 but I see that this behaviour
wasn't changed also in newer versions.
maybe we should disable the device interrupts before dev_close(), and
enable it after dev_open().

>
> Hi Romain,
>
> On Sun, Mar 31, 2024 at 4:35 PM Romain Gantois
> <romain.gantois@...tlin.com> wrote:
> >
> > Hello Cathy,
> >
> > On Wed, 27 Mar 2024, Cathy Cai wrote:
> >
> > > Tx queue time out then reset adapter. When reset the adapter, stmmac driver
> > > sets the state to STMMAC_DOWN and calls dev_close() function. If an interrupt
> > > is triggered at this instant after setting state to STMMAC_DOWN, before the
> > > dev_close() call.
> > >
> > ...
> > > -     set_bit(STMMAC_DOWN, &priv->state);
> > >       dev_close(priv->dev);
> > > +     set_bit(STMMAC_DOWN, &priv->state);
> > >       dev_open(priv->dev, NULL);
> > >       clear_bit(STMMAC_DOWN, &priv->state);
> > >       clear_bit(STMMAC_RESETING, &priv->state);
> >
> > If this IRQ issue can happen whenever STMMAC_DOWN is set while the net device is
> > open, then it could also happen between the dev_open() and
> > clear_bit(STMMAC_DOWN) calls right? So you'd have to clear STMMAC_DOWN before
> > calling dev_open() but then I don't see the usefulness of setting STMMAC_DOWN
> > and clearing it immediately. Maybe closing and opening the net device should be
> > enough?
Indeed we encounter an issue between the dev_open() and clear_bit(STMMAC_DOWN)..
> >
>  Yes. It could also happen between the dev_open() and
> clear_bit(STMMAC_DOWN) calls.
> Although we did not reproduce this scenario, it should have happened
> if we had increased
> the number of test samples. In addition, I found that other people had
> similar problems before.
> The link is:
> https://lore.kernel.org/all/20210208140820.10410-11-Sergey.Semin@baikalelectronics.ru/
>
> >
> > Moreover, it seems strange to me that stmmac_interrupt() unconditionnally
> > ignores interrupts when the driver is in STMMAC_DOWN state. This seems like
> > dangerous behaviour, since it could cause IRQ storm issues whenever something
> > in the driver sets this state. I'm not too familiar with the interrupt handling
> > in this driver, but maybe stmmac_interrupt() could clear interrupts
> > unconditionnally in the STMMAC_DOWN state?
> >
> Clear interrupts unconditionally in the STMMAC_DOWN state directly
> certainly won't cause this problem.
> This may be too rough, maybe this design has other considerations.
>
But then after the dev_open() you might miss interrupt, no?
> >
> > Best Regards,
> >
> > --
> > Romain Gantois, Bootlin
> > Embedded Linux and Kernel engineering
> > https://bootlin.com
>
>  Best Regards,
> Cathy



-- 
Regards,
Avi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ