lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20100325093744.GH20695@one.firstfloor.org>
Date:	Thu, 25 Mar 2010 10:37:44 +0100
From:	Andi Kleen <andi@...stfloor.org>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Andi Kleen <andi@...stfloor.org>, x86@...nel.org,
	LKML <linux-kernel@...r.kernel.org>, jesse.brandeburg@...el.com,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] Prevent nested interrupts when the IRQ stack is near overflowing v2

On Thu, Mar 25, 2010 at 02:46:42AM +0100, Thomas Gleixner wrote:
> "misleading" is an euphemism at best ... 
> 
> This is ever repeating shit: your changelogs suck big time!

As much as your comments I guess.

> 
> > "
> > Multiple vectors on a multi port NIC pointing to the same CPU, 
> > all hitting the irq stack until it overflows.
> > "
> 
> So there are several questions:
> 
> 1) Why are those multiple vectors all hitting the same cpu at the same
>    time ? How many of them are firing at the same time ?

This was 4 NIC ports being operational under stress at the same time

> 
> 2) What kind of scenario is that ? Massive traffic on the card or some
>    corner case ?

Massive traffic on the card from multiple ports on a large system.

> 3) Why does the NIC driver code not set IRQF_DISABLED in the first
>    place?  AFAICT the network drivers just kick off NAPI, so whats the
>    point to run those handlers with IRQs enabled at all ?

I think the idea was to minimize irq latency for other interrupts

But yes defaulting to IRQF_DISABLED would fix it too, at some 
cost. In principle that could be done also.

> 
> > > case of MSI-X it just disables the IRQ when it comes again while the
> > > first irq on that vector is still in progress. So the maximum nesting
> > > is two up to handle_edge_irq() where it disables the IRQ and returns
> > > right away.
> > 
> > Real maximum nesting is all IRQs running with interrupts on pointing
> > to the same CPU. Enough from multiple busy IRQ sources and you go boom.
> 
> Which leads to the general question why we have that IRQF_DISABLED
> shite at all. AFAICT the historical reason were IDE drivers, but we

My understanding was that traditionally the irq handlers were
allowed to nest and the "fast" non nest case was only added for some 
fast handlers like serial with small FIFOs.

> grew other abusers like USB, SCSI and other crap which runs hard irq
> handlers for hundreds of micro seconds in the worst case. All those
> offenders need to be fixed (e.g. by converting to threaded irq
> handlers) so we can run _ALL_ hard irq context handlers with interrupts
> disabled. lockdep will sort out the nasty ones which enable irqs in the
> middle of that hard irq handler.

Ok glad to give you advertisement time for your pet project...

Anyways if such a thing was done it would be a long term project
and that short term fix would be still needed.

> the handlers on which you enforce IRQ_DISABLED does not enable
> interrupts itself ? You _CANNOT_.

I can't, just as much as I cannot enforce they won't crash or not loop
forever or something. But afaik they don't. 

I did some quick grepping and didn't found a driver who does that
at least.

-Andi
-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ