linux-kernel - Re: [patch] irq: do not mask interrupts by default

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0611140757040.31445@g5.osdl.org>
Date:	Tue, 14 Nov 2006 08:10:48 -0800 (PST)
From:	Linus Torvalds <torvalds@...l.org>
To:	Ingo Molnar <mingo@...hat.com>
cc:	"Eric W. Biederman" <ebiederm@...ssion.com>,
	Komuro <komurojun-mbn@...ty.com>, tglx@...utronix.de,
	Adrian Bunk <bunk@...sta.de>, Andrew Morton <akpm@...l.org>,
	Linux Kernel Mailing List <linux-kernel@...r.kernel.org>
Subject: Re: [patch] irq: do not mask interrupts by default

On Tue, 14 Nov 2006, Ingo Molnar wrote:
> 
> 1) disable_irq() is relatively rare (used in about 10% of drivers, but
> there it's overwhelmingly used in some slowpath) so it's performance
> uncritical.

Well, the thing is, the _replay_ if it does happen, is going to be really 
really slow compared to the masking. So at that point, it may well be a 
net performance downside if the masking is going to almost always have an 
interrupt happen while the thing is masked. I dunno.

There's another thing too:

For level-triggered interrupts, I _really_ don't think we should do this. 
The code inside the masked region is sometimes "setup code", which will do 
things that _will_ raise an interrupt, but may read the status register or 
whatever to then unraise it. So in that case, your patch will generate 
different behaviour, something that I really don't want to introduce at 
this point in the 2.6.19 series.

> 2) missing an IRQ while the line is masked is often a lethal regression 
> to the user. An IRQ could be missed even if we think that the IRQ line 
> is 'level-triggered'.

If it's level-triggered, it's going to be missed only if it's de-asserted 
by code inside the masked region, and that is what we have always done on 
purpose, so "missing" it is the right thing to do. It's what we have 
tested all level-triggered interrupts with for the last 15+ years, and 
it's been part of the semantics for masking.

So I absolutely do _not_ think your change is improved semantics. It's new 
semantics, and illogical. If the driver masked the irq line, did some 
testing that raises and clears it again ("let's check if this version of 
the chip raises the interrupt when we do XYZZY"), then the logical thing 
to do would be to not cause the interrupt to happen.

Of course, for edge-triggered APIC interrupts, we _have_ to replay the irq 
(since we don't have any way of even *knowing* whether we might get it 
again), but for level-triggered and for the old legacy i8259 controller 
that gets it right for edges anwyay, we should _not_ send the spurious 
interrupt that is no longer active.

And a lot of code has been tested with either just the i8259 (old machines 
without any APIC) or with PCI-only devices (which are always level- 
triggered), so the fact that edge-triggered things have always seen the 
potential for spurious interrupts is not a reasong to say "well, they have 
to handle it anyway". True PCI drivers generally do _not_ have to handle 
the crazy case, and generally have never seen it.

> so my patch changes the default irq-disable logic of /all/ controllers 
> to "delayed disable". (IRQ chips can still override this by providing a 
> different chip->disable method that just clones their ->mask method, if 
> it is absolutely sure that no IRQs can be lost while masked)

I really think we should do this just for APIC edge triggered interrupts, 
ie keep the old behaviour.

Also, I worry a bit about the patch:

> @@ -272,8 +268,11 @@ handle_simple_irq(unsigned int irq, stru
>  	kstat_cpu(cpu).irqs[irq]++;
>  
>  	action = desc->action;
> -	if (unlikely(!action || (desc->status & IRQ_DISABLED)))
> +	if (unlikely(!action || (desc->status & IRQ_DISABLED))) {
> +		if (desc->chip->mask)
> +			desc->chip->mask(irq);
>  		goto out_unlock;
> +	}

The simple-irq case too? That's not even going to replay the thing? So now 
you just mask (without replaying) simple irqs, but then the other irqs you 
mask and replay.. See above on why I don't think this is necessarily a bug 
(since masking is almost always the right thing _anyway_), but now it will 
*STILL* depend on some internal implementation decision on whether the 
replay happens at all. I'd much rather have the replay decision be based 
on hard physical data: we replay _only_ for edge-triggered interrupts, and 
_only_ for controllers that need it. 

In other words, I think we should just make APIC-edge have the "please 
delay masking and replay" bit, and nobody else.

Can you send that patch (for both x86 and x86-64), and we can ask Komuro 
to test it. That would be the "same behaviour as we've always had" thing, 
which I think is also the _right_ behaviour.

		Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/