linux-kernel - Re: 2.6.19-rc1 genirq causes either boot hang or "do

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Pine.LNX.4.64.0610071154510.3952@g5.osdl.org>
Date:	Sat, 7 Oct 2006 12:03:16 -0700 (PDT)
From:	Linus Torvalds <torvalds@...l.org>
To:	"Eric W. Biederman" <ebiederm@...ssion.com>
cc:	Muli Ben-Yehuda <muli@...ibm.com>, Ingo Molnar <mingo@...e.hu>,
	Thomas Gleixner <tglx@...utronix.de>,
	Benjamin Herrenschmidt <benh@...nel.crashing.org>,
	Rajesh Shah <rajesh.shah@...el.com>, Andi Kleen <ak@....de>,
	"Protasevich, Natalie" <Natalie.Protasevich@...SYS.com>,
	"Luck, Tony" <tony.luck@...el.com>, Andrew Morton <akpm@...l.org>,
	Linux-Kernel <linux-kernel@...r.kernel.org>,
	Badari Pulavarty <pbadari@...il.com>
Subject: Re: 2.6.19-rc1 genirq causes either boot hang or "do_IRQ: cannot
 handle IRQ -1"

On Sat, 7 Oct 2006, Eric W. Biederman wrote:
> 
> I am hoping that by running the apics in a different delivery mode
> that explicitly says just deliver this interrupt to this cpu we
> will avoid the problem you are seeing.

Note that having too strict delivery modes could be a major pain in the 
future, with things like multicore CPU's a lot more actively doing power 
management on their own, and effectively going into sleep-states with 
reasonably long latencies.

Especially with schedulers that are aware of things like that (and we 
_try_, at least to some degree, and people are interested in more of it), 
you can easily be in the situation that one of the cores is being fairly 
actively kept in a low-power state, and can have millisecond latencies 
(not to mention no L1 cache contents etc).

So I really do think that the belief that we should force irqs to a 
particular core is fundamentally flawed.

We used to do lowest-priority stuff in hw, and then Intel broke it, but I 
always told them that they were _stupid_ to break it. The fact is, 
especially with multi-core, it actually makes a lot of sense to have 
hardware decide which core to interrupt, because hardware simply 
potentially knows better.

This is one of those age-old questions: in _theory_ you can do a better 
job in software, but in _practice_ it's just too damn expensive and 
complicated to do a perfect job especially with dynamic decisions, so in 
_practice_ it tends to be better to let hardware make some of the 
decisions.

We can see the same thing in instruction scheduling: in _theory_ a 
compiler can do a better job of scheduling, since it can spend inordinate 
amounts of resources on doing things once, and then the hardware can be 
simpler and faster and never worry about it. In _practice_, however, the 
biggest scheduling decisions are all dynamic at run-time, and depend on 
things like cache misses etc, and only total idiots (or embedded people) 
will do static scheduling these days.

I think it's a huge mistake to do static interrupt routing for the same 
reason.

			Linus
-
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/