[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.20.1707040928360.9000@nanos>
Date: Tue, 4 Jul 2017 10:12:16 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Linus Torvalds <torvalds@...ux-foundation.org>
cc: Jens Axboe <axboe@...nel.dk>, Max Gurtovoy <maxg@...lanox.com>,
Christoph Hellwig <hch@....de>,
LKML <linux-kernel@...r.kernel.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Ingo Molnar <mingo@...nel.org>,
"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [GIT pull] irq updates for 4.13
On Mon, 3 Jul 2017, Linus Torvalds wrote:
> On Mon, Jul 3, 2017 at 12:42 AM, Thomas Gleixner <tglx@...utronix.de> wrote:
> >
> > please pull the latest irq-core-for-linus git tree from:
> >
> > git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git irq-core-for-linus
>
> Ugh, this caused conflicts with the block tree, with commits
>
> - fe631457ff3e: "blk-mq: map all HWQ also in hyperthreaded system"
>
> - 5f042e7cbd9e "blk-mq: Include all present CPUs in the default queue mapping"
>
> clashing.
>
> I'm not at all understanding why that second commit came in through
> the irq tree at all, in fact. Very annoying. Why was that not sent
> through the block tree? It doesn't seem to have anything fundamentally
> to do with irqs, really: it's a driver CPU choice for irq chocie.
There is a dependency. The changes in the block code rely on the new
features of the generic interrupt affinity management. See below.
> Anyway, I absolutely detested that code, and the obvious resolution
> was too disgusting to live. So I did an evil merge and moved some
> things around in the merge to make it at least not cause me to dig my
> eyes out.
>
> But I'd like people to look at that - not so much due to the evil
> merge itself (but check that too, by any means), but just because the
> code seems fundamentally broken for the hotplug case. We end up
> picking a possible metric shit-ton of CPU's for queue 0, if they were
> "possible but not online".
The mechanism is:
Spread out the queues and the associated interrupts accross the possible
CPUs. This results in a queue/interrupt per group of CPUs (group can be a
single CPU)
If a group is offline, then the interrupt is kept in managed shutdown
mode. If a CPU of the group comes online then the core management starts up
the interrupt and makes it affine to that CPU.
If the last CPU of a group goes offline, the interrupt is not moved to some
random other CPU. It's put in managed shutdown mode and then restarted when
the a CPU of the group comes online again.
That exercise avoids exactly the 'metric tons of irqs' moved to random CPUs
and then brought back to the target CPUs when they come online
again. On/offline seems to be (ab)used frequently for power management
purposes nowadays.
Sorry, if I did not make that clear enough in the pull request message.
Thanks,
tglx
Powered by blists - more mailing lists