[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <248a9149-ad8d-ac84-cdc4-67c344179763@huawei.com>
Date: Mon, 12 Aug 2019 09:46:48 +0100
From: John Garry <john.garry@...wei.com>
To: Ming Lei <tom.leiming@...il.com>
CC: Thomas Gleixner <tglx@...utronix.de>,
Marc Zyngier <maz@...nel.org>,
Sebastian Siewior <bigeasy@...utronix.de>,
chenxiang <chenxiang66@...ilicon.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Hannes Reinecke <hare@...e.com>
Subject: Re: About threaded interrupt handler CPU affinity
On 10/08/2019 10:47, Ming Lei wrote:
> On Tue, Jul 23, 2019 at 1:40 AM John Garry <john.garry@...wei.com> wrote:
>>
>> On 22/07/2019 16:34, Thomas Gleixner wrote:
>>> John,
>>>
>>
>> Hi Thomas,
>>
>>> On Mon, 22 Jul 2019, John Garry wrote:
>>>> On 22/07/2019 15:41, Marc Zyngier wrote:
>>>>> On 22/07/2019 15:14, John Garry wrote:
>>>>>> I have a question on commit cbf8699996a6 ("genirq: Let irq thread follow
>>>>>> the effective hard irq affinity"), if you could kindly check:
>>>>>>
>>>>>> Here we set the thread affinity to be the same as the hard interrupt
>>>>>> affinity. For an arm64 system with GIC ITS, this will be a single CPU,
>>>>>> the lowest in the interrupt affinity mask. So, in this case, effectively
>>>>>> the thread will be bound to a single CPU. I think APIC is the same for
>>>>>> this.
>>>>>>
>>>>>> The commit message describes the problem that we solve here is that the
>>>>>> thread may become affine to a different CPU to the hard interrupt - does
>>>>>> it mean that the thread CPU mask could not cover that of the hard
>>>>>> interrupt? I couldn't follow the reason.
>>>>>
>>>>> Assume a 4 CPU system. If the interrupt affinity is on CPU0-1, you could
>>>>> end up with the effective interrupt affinity on CPU0 (which would be
>>>>> typical of the ITS), and the thread running on CPU1. Not great.
>>>>
>>>> Sure, not great. But the thread can possibly still run on CPU0.
>>>
>>> Sure. It could, but it's up to the scheduler to decide. In general it's the
>>> right thing to run the threaded handler on the CPU which handles the
>>> interrupt.
>>
>> I'd agree.
>>
>>> With single CPU affinity thats surely a limitation.
>>>
>>>>>> We have experimented with fixing the thread mask to be the same as the
>>>>>> interrupt mask (we're using managed interrupts), like before, and get a
>>>>>> significant performance boost at high IO datarates on our storage
>>>>>> controller - like ~11%.
>>>>>
>>>>> My understanding is that this patch does exactly that. Does it result in
>>>>> a regression?
>>>>
>>>> Not in the strictest sense for us, I don't know about others. Currently we use
>>>> tasklets, and we find that the CPUs servicing the interrupts (and hence
>>>> tasklets) are heavily loaded. We experience the same for when experimenting
>>>> with threaded interrupt handlers - which would be as expected.
>>>>
>>>> But, when we make the change as mentioned, our IOPS goes from ~3M -> 3.4M.
>>>
>>> So your interrupt is affined to more than one CPU, but due to the ITS
>>> limitation the effective affinity is a single CPU, which in turn restricts
>>> the thread handler affinity to the same single CPU.
>>
>> Even though this is an ITS limitation, the same thing is effectively
>> done for x86 APIC as policy, right? I'm referring to commit fdba46ffb4c2
>> ("x86/apic: Get rid of multi CPU affinity")
>>
>> If you lift that
>>> restriction and let it be affine to the full affinity set of the interrupt
>>> then you get better performance, right?
>>
>> Right
>>
>> Probably because the other CPU(s)
>>> in the affinity set are less loaded than the one which handles the hard
>>> interrupt.
>>
>> I will look to get some figures for CPU loading to back this up.
>>
>>>
>>> This is heavily use case dependent I assume, so making this a general
>>> change is perhaps not a good idea, but we could surely make this optional.
>>
>> That sounds reasonable. So would the idea be to enable this optionally
>> at the request threaded irq call?
>
> I'd suggest to do it for managed IRQ at default, because managed IRQ affinity
> is NUMA locality and setup gracefully. And the idea behind is good since the IRQ
> handler should have been run in the specified CPUs, especially the thread part
> often takes more CPU.
So I was going to send a patch for this change.
However, for the SCSI driver I found it helps performance, managed
interrupts are disabled by default (due to this yet unresolved issue
https://lkml.org/lkml/2019/1/29/391 - that's just one thread on this
issue), so I don't think that it's right to enable it on that basis.
If there's some other driver which uses manages interrupts + threaded
interrupt handlers and we can prove it helps, then I'd say that should
be good enough.
Thanks,
John
Powered by blists - more mailing lists