[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <a5154365-59c5-429b-559e-94ad6dffcdb0@huawei.com>
Date: Fri, 20 Dec 2019 15:38:24 +0000
From: John Garry <john.garry@...wei.com>
To: Marc Zyngier <maz@...nel.org>
CC: Ming Lei <ming.lei@...hat.com>, <tglx@...utronix.de>,
"chenxiang (M)" <chenxiang66@...ilicon.com>,
<bigeasy@...utronix.de>, <linux-kernel@...r.kernel.org>,
<hare@...e.com>, <hch@....de>, <axboe@...nel.dk>,
<bvanassche@....org>, <peterz@...radead.org>, <mingo@...hat.com>
Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity
for managed interrupt
>> We've got some more results and it looks promising.
>>
>> So with your patch we get a performance boost of 3180.1K -> 3294.9K
>> IOPS in the D06 SAS env. Then when we change the driver to use
>> threaded interrupt handler (mainline currently uses tasklet), we get a
>> boost again up to 3415K IOPS.
>>
>> Now this is essentially the same figure we had with using threaded
>> handler + the gen irq change in spreading the handler CPU affinity. We
>> did also test your patch + gen irq change and got a performance drop,
>> to 3347K IOPS.
>>
>> So tentatively I'd say your patch may be all we need.
>
> OK.
>
>> FYI, here is how the effective affinity is looking for both SAS
>> controllers with your patch:
>>
>> 74:02.0
>> irq 81, cpu list 24-29, effective list 24 cq
>> irq 82, cpu list 30-35, effective list 30 cq
>
> Cool.
>
> [...]
>
>> As for your patch itself, I'm still concerned of possible regressions
>> if we don't apply this effective interrupt affinity spread policy to
>> only managed interrupts.
>
> I'll try and revise that as I post the patch, probably at some point
> between now and Christmas. I still think we should find a way to
> address this for the D05 SAS driver though, maybe by managing the
> affinity yourself in the driver. But this requires experimentation.
I've already done something experimental for the driver to manage the
affinity, and performance is generally much better:
https://github.com/hisilicon/kernel-dev/commit/e15bd404ed1086fed44da34ed3bd37a8433688a7
But I still think it's wise to only consider managed interrupts for now.
>
>> JFYI, about NVMe CPU lockup issue, there are 2 works on going here:
>>
>> https://lore.kernel.org/linux-nvme/20191209175622.1964-1-kbusch@kernel.org/T/#t
>>
>>
>> https://lore.kernel.org/linux-block/20191218071942.22336-1-ming.lei@redhat.com/T/#t
>>
>
> I've also managed to trigger some of them now that I have access to
> a decent box with nvme storage.
I only have 2x NVMe SSDs when this occurs - I should not be hitting this...
Out of curiosity, have you tried
> with the SMMU disabled? I'm wondering whether we hit some livelock
> condition on unmapping buffers...
No, but I can give it a try. Doing that should lower the CPU usage,
though, so maybe masks the issue - probably not.
Much appreciated,
John
Powered by blists - more mailing lists