linux-kernel - Re: [PATCH] Prevent nested interrupts when the IRQ stack is near overflowing v2

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:	Thu, 25 Mar 2010 13:11:41 +0100
From:	Andi Kleen <andi@...stfloor.org>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	Andi Kleen <andi@...stfloor.org>, x86@...nel.org,
	LKML <linux-kernel@...r.kernel.org>, jesse.brandeburg@...el.com,
	Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [PATCH] Prevent nested interrupts when the IRQ stack is near overflowing v2

On Thu, Mar 25, 2010 at 12:09:10PM +0100, Thomas Gleixner wrote:
> On Thu, 25 Mar 2010, Andi Kleen wrote:
> > On Thu, Mar 25, 2010 at 02:46:42AM +0100, Thomas Gleixner wrote:
> > > 3) Why does the NIC driver code not set IRQF_DISABLED in the first
> > >    place?  AFAICT the network drivers just kick off NAPI, so whats the
> > >    point to run those handlers with IRQs enabled at all ?
> > 
> > I think the idea was to minimize irq latency for other interrupts
> 
> So what's the point ? Is the irq handler of that card so long running,
> that it causes trouble ? 

I believe it's more like that you have a lot of them (even with interrupt
mitigation and NAPI polling) and they have to scale up the work to handle
high bandwidths, so it ends up with a lot of time in them anyways,
 ending with skew in the timers and similar issues.

Anyways my goal here was simply to have a least intrusive fix, not
change semantics for everyone.

> > cost. In principle that could be done also.
> 
> What's the cost? Nothing at all. There is no f*cking difference between:
> 
>  IRQ1 10us
>  IRQ2 10us
>  IRQ3 10us
>  IRQ4 10us
> 
> and
> 
>  IRQ1 2us
>   IRQ2 2us
>    IRQ3 2us
>     IRQ4 10us
>    IRQ3 8us
>   IRQ2 8us
>  IRQ1 8us
> 
> The system is neither running a task nor a softirq for 40us in both
> cases.

Yes it could work out this. Or it could not. I'm not sure, so I chose
the safe option.

> 
> So what's the point of running a well written (short) interrupt
> handler with interrupts enabled ? Nothing at all. It just makes us
> deal with crap like stacks overflowing for no good reason.

Ok so you're just proposing to always set IRQF_DISABLED? 

If you can force everyone to use that that would work I guess.

I don't know what problems it would cause. My fear is that any latency problems
would be answered with a "move to RT, moron" from your mouth though.

> > Anyways if such a thing was done it would be a long term project
> > and that short term fix would be still needed.
> 
> Your patch is not a fix, It's a lousy, horrible and unreliable
> workaround. It's not fixing the root cause of the problem at hand.

It fixes the bug in a minimally intrusive way.
> 
> The real fix is to run the NIC interrupt handlers with IRQs disabled
> and be done with it. If you still think that introduces latencies then
> prove it with numbers.

Sorry you got that wrong. I'm not proposing to change semantics, you
are proposing that. So you would need to prove anything if at all.

Anyways if you think you can write a better patch to fix that bug
please fell free to write one.

-Andi
-- 
ak@...ux.intel.com -- Speaking for myself only.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/