lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.LFD.2.00.1003250201370.3147@localhost.localdomain>
Date:	Thu, 25 Mar 2010 02:46:42 +0100 (CET)
From:	Thomas Gleixner <tglx@...utronix.de>
To:	Andi Kleen <andi@...stfloor.org>
cc:	x86@...nel.org, LKML <linux-kernel@...r.kernel.org>,
	jesse.brandeburg@...el.com,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] Prevent nested interrupts when the IRQ stack is near
 overflowing v2

On Thu, 25 Mar 2010, Andi Kleen wrote:

> On Thu, Mar 25, 2010 at 12:08:23AM +0100, Thomas Gleixner wrote:
> > On Wed, 24 Mar 2010, Thomas Gleixner wrote:
> > 
> > > On Wed, 24 Mar 2010, Andi Kleen wrote:
> > > 
> > > > Prevent nested interrupts when the IRQ stack is near overflowing v2
> > > > 
> > > > Interrupts can always nest when they don't run with IRQF_DISABLED.
> > > > 
> > > > When a lot of interrupts hit the same vector on the same
> > > > CPU nested interrupts can overflow the irq stack and cause hangs.
> > That's utter nonsense. An interrupt storm on the same vector does not
> > cause irq nesting. The irq code prevents reentering a handler and in
> 
> Sorry it's the same CPU, not the same vector.  Yes the reference
> to same vector was misleading.

"misleading" is an euphemism at best ... 

This is ever repeating shit: your changelogs suck big time!

> "
> Multiple vectors on a multi port NIC pointing to the same CPU, 
> all hitting the irq stack until it overflows.
> "

So there are several questions:

1) Why are those multiple vectors all hitting the same cpu at the same
   time ? How many of them are firing at the same time ?

2) What kind of scenario is that ? Massive traffic on the card or some
   corner case ?

3) Why does the NIC driver code not set IRQF_DISABLED in the first
   place?  AFAICT the network drivers just kick off NAPI, so whats the
   point to run those handlers with IRQs enabled at all ?

> > case of MSI-X it just disables the IRQ when it comes again while the
> > first irq on that vector is still in progress. So the maximum nesting
> > is two up to handle_edge_irq() where it disables the IRQ and returns
> > right away.
> 
> Real maximum nesting is all IRQs running with interrupts on pointing
> to the same CPU. Enough from multiple busy IRQ sources and you go boom.

Which leads to the general question why we have that IRQF_DISABLED
shite at all. AFAICT the historical reason were IDE drivers, but we
grew other abusers like USB, SCSI and other crap which runs hard irq
handlers for hundreds of micro seconds in the worst case. All those
offenders need to be fixed (e.g. by converting to threaded irq
handlers) so we can run _ALL_ hard irq context handlers with interrupts
disabled. lockdep will sort out the nasty ones which enable irqs in the
middle of that hard irq handler.

Your band aid patch is just disgusting. How do you ensure that none of
the handlers on which you enforce IRQ_DISABLED does not enable
interrupts itself ? You _CANNOT_.

I'm not taking that patch unless you come up with a real convincing
story.

Thanks,

	tglx


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ