linux-kernel - Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity for managed interrupt

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <0fd543f8ffd90f90deb691aea1c275b4@www.loen.fr>
Date:   Fri, 20 Dec 2019 14:43:36 +0000
From:   Marc Zyngier <maz@...nel.org>
To:     John Garry <john.garry@...wei.com>
Cc:     Ming Lei <ming.lei@...hat.com>, <tglx@...utronix.de>,
        "chenxiang (M)" <chenxiang66@...ilicon.com>,
        <bigeasy@...utronix.de>, <linux-kernel@...r.kernel.org>,
        <hare@...e.com>, <hch@....de>, <axboe@...nel.dk>,
        <bvanassche@....org>, <peterz@...radead.org>, <mingo@...hat.com>
Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity  for managed interrupt

Hi John,

On 2019-12-20 11:30, John Garry wrote:
>>> So you enqueue requests from CPU0 only? It seems a bit odd...
>> No, but maybe I wasn't clear enough. I'll give an overview:
>> For D06 SAS controller - which is a multi-queue PCI device - we use 
>> managed interrupts. The HW has 16 submission/completion queues, so for 
>> 96 cores, we have an even spread of 6 CPUs assigned per queue; and 
>> this per-queue CPU mask is the interrupt affinity mask. So CPU0-5 
>> would submit any IO on queue0, CPU6-11 on queue2, and so on. PCI NVMe 
>> is essentially the same.
>> These are the environments which we're trying to promote 
>> performance.
>> Then for D05 SAS controller - which is multi-queue platform device 
>> (mbigen) - we don't use managed interrupts. We still submit IO from 
>> any CPU, but we choose the queue to submit IO on a round-robin basis 
>> to promote some isolation, i.e. reduce inter-queue lock contention, so 
>> the queue chosen has nothing to do with the CPU.
>> And with your change we may submit on cpu4 but service the interrupt 
>> on cpu30, as an example. While previously we would always service on 
>> cpu0. The old way still isn't ideal, I'll admit.
>> For this env, we would just like to maintain the same performance. 
>> And it's here that we see the performance drop.
>>
>
> Hi Marc,
>
> We've got some more results and it looks promising.
>
> So with your patch we get a performance boost of 3180.1K -> 3294.9K
> IOPS in the D06 SAS env. Then when we change the driver to use
> threaded interrupt handler (mainline currently uses tasklet), we get 
> a
> boost again up to 3415K IOPS.
>
> Now this is essentially the same figure we had with using threaded
> handler + the gen irq change in spreading the handler CPU affinity. 
> We
> did also test your patch + gen irq change and got a performance drop,
> to 3347K IOPS.
>
> So tentatively I'd say your patch may be all we need.

OK.

> FYI, here is how the effective affinity is looking for both SAS
> controllers with your patch:
>
> 74:02.0
> irq 81, cpu list 24-29, effective list 24 cq
> irq 82, cpu list 30-35, effective list 30 cq

Cool.

[...]

> As for your patch itself, I'm still concerned of possible regressions
> if we don't apply this effective interrupt affinity spread policy to
> only managed interrupts.

I'll try and revise that as I post the patch, probably at some point
between now and Christmas. I still think we should find a way to
address this for the D05 SAS driver though, maybe by managing the
affinity yourself in the driver. But this requires experimentation.

> JFYI, about NVMe CPU lockup issue, there are 2 works on going here:
> 
> https://lore.kernel.org/linux-nvme/20191209175622.1964-1-kbusch@kernel.org/T/#t
> 
> https://lore.kernel.org/linux-block/20191218071942.22336-1-ming.lei@redhat.com/T/#t

I've also managed to trigger some of them now that I have access to
a decent box with nvme storage. Out of curiosity, have you tried
with the SMMU disabled? I'm wondering whether we hit some livelock
condition on unmapping buffers...

> Cheers,
> John
>
> Ps. Thanks to Xiang Chen for all the work here in getting these 
> results.

Yup, much appreciated!

Thanks,

         M.
-- 
Jazz is not dead. It just smells funny...