linux-kernel - Re: [PATCH] irqchip/gic-v3-its: Balance initial LPI affinity across CPUs

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <e65af3fd-e7c8-bd9b-75ff-f2d990338221@huawei.com>
Date:   Tue, 10 Mar 2020 11:33:17 +0000
From:   John Garry <john.garry@...wei.com>
To:     Thomas Gleixner <tglx@...utronix.de>, Marc Zyngier <maz@...nel.org>
CC:     <linux-kernel@...r.kernel.org>,
        Jason Cooper <jason@...edaemon.net>,
        "Ming Lei" <ming.lei@...hat.com>,
        "chenxiang (M)" <chenxiang66@...ilicon.com>
Subject: Re: [PATCH] irqchip/gic-v3-its: Balance initial LPI affinity across
 CPUs

On 20/01/2020 19:24, Thomas Gleixner wrote:
> Marc,
> 
> Marc Zyngier <maz@...nel.org> writes:
>> We're stuck between a rock and a hard place here:
>>
>> (1) We place all interrupts on the least loaded CPU that matches
>>       the affinity -> results in performance issues on some funky
>>       HW (like D05's SAS controller).
>>
>> (2) We place managed interrupts on the least loaded CPU that matches
>>       the affinity -> we have artificial load on NUMA boundaries, and
>>       reduced spread of overlapping managed interrupts.
>>
>> (3) We don't account for non-managed LPIs, and we run the risk of
>>       unpredictable performance because we don't really know where
>>       the *other* interrupts are.
>>
>> My personal preference would be to go for (1), as in my original post.
>> I find (3) the least appealing, because we don't track things anymore.
>> (2) feels like "the least of all evils", as it is a decent performance
>> gain, seems to give predictable performance, and doesn't regress lesser
>> systems...
>>
>> I'm definitely open to suggestions here.
> 
> The way x86 does it and that's mostly ok except for some really broken
> setups is:
> 
> 1) Non-managed interrupts:
> 
>     If the interrupt is bound to a node, then we try to find a target
> 
>       I)  in the intersection of affinity mask and node mask.
> 
>       II) in the nodemask itself
> 
>           Yes we ignore affinity mask there because that's pretty much
>           the same as if the given affinity does not contain an online
>           CPU.
> 
>       If all of that fails then we try the nodeless mode
> 
>     If the interrupt is not bound to a node, then we try to find a target
> 
>       I)  in the intersection of affinity mask and online mask.
> 
>       II) in the onlinemask itself
> 
>    Each step searches for the CPU in the searched mask which has the
>    least number of total interrupts assigned.
> 
> 2) Managed interrupts
> 
>    For managed interrupts we just search in the intersection of assigned
>    mask and online CPUs for the CPU with the least number of managed
>    interrupts.
> 
>    If no CPU is online then the interrupt is shutdown anyway, so no
>    fallback required.
> 
> Don't know whether that's something you can map to ARM64, but I assume
> the principle of trying to enforce NUMA locality plus balancing the
> number of interrupts makes sense in general.
> 

Hi Marc,

I was wondering if there is anything we can do to progress this patch?

Apart from being a good change in itself, I need to do some SMMU testing 
for nextgen product development and I would like to include this patch, 
most preferably from mainline.

Cheers,
John