[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.21.1809171713270.16580@nanos.tec.linutronix.de>
Date: Mon, 17 Sep 2018 17:32:05 +0200 (CEST)
From: Thomas Gleixner <tglx@...utronix.de>
To: Dou Liyang <dou_liyang@....com>
cc: linux-kernel@...r.kernel.org, x86@...nel.org, mingo@...hat.com,
hpa@...or.com, douly.fnst@...fujitsu.com
Subject: Re: [PATCH v3 2/2] irq/matrix: Spread managed interrupts on
allocation
On Sun, 9 Sep 2018, Dou Liyang wrote:
> From: Dou Liyang <douly.fnst@...fujitsu.com>
>
> Linux has spread out the non managed interrupt across the possible
> target CPUs to avoid vector space exhaustion.
>
> But, the same situation may happen on the managed interrupts.
Second thougts on this.
Spreading the managed interrupts out at vector allocation time does not
prevent vector exhaustion at all, because contrary to regular interrupts
managed interrupts have a guaranteed allocation. IOW when the managed
interrupt is initialized (that's way before the actual vector allocation
happens) a vector is reserved on each CPU which is in the associated
interrupt mask.
This is an essential property of managed interrupts because the kernel
guarantees that they can be moved to any CPU in the supplied mask during
CPU hot unplug and consequently shut down when the last CPU in the mask
goes offline.
So for that special case of pre/post vectors the supplied mask is all CPUs
and the guaranteed reservation will claim a vector on each CPU. What makes
it look unbalanced is that when the interrupts are actually requested, all
end up on CPU0 as that's the first CPU in the mask.
So doing the spreading does not prevent vector exhaustion it merily spreads
the active interrupts more evenly over the CPUs in the mask.
I think it's still worthwhile to do that, but the changelog needs a major
overhaul as right now it's outright misleading. I'll just amend it with
something along the above lines, unless someone disagrees.
That said, it might also be interesting to allow user space affinity
settings on managed interrupts. Not meant for the pre/post vector case,
which just needs to be made non managed. It's meant for the case where a
device has less queues than CPUs, where changing affinity within the spread
range of CPUs could be allowed. Not sure though. Delegating this to the
folks who actually use that in their drivers.
Thanks,
tglx
Powered by blists - more mailing lists