[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <78a10958-fdc9-0576-0c39-6079b9749d39@huawei.com>
Date: Mon, 9 Dec 2019 14:30:59 +0000
From: John Garry <john.garry@...wei.com>
To: Ming Lei <ming.lei@...hat.com>
CC: <tglx@...utronix.de>, <chenxiang66@...ilicon.com>,
<bigeasy@...utronix.de>, <linux-kernel@...r.kernel.org>,
<maz@...nel.org>, <hare@...e.com>, <hch@....de>, <axboe@...nel.dk>,
<bvanassche@....org>, <peterz@...radead.org>, <mingo@...hat.com>
Subject: Re: [PATCH RFC 1/1] genirq: Make threaded handler use irq affinity
for managed interrupt
On 07/12/2019 08:03, Ming Lei wrote:
> On Fri, Dec 06, 2019 at 10:35:04PM +0800, John Garry wrote:
>> Currently the cpu allowed mask for the threaded part of a threaded irq
>> handler will be set to the effective affinity of the hard irq.
>>
>> Typically the effective affinity of the hard irq will be for a single cpu. As such,
>> the threaded handler would always run on the same cpu as the hard irq.
>>
>> We have seen scenarios in high data-rate throughput testing that the cpu
>> handling the interrupt can be totally saturated handling both the hard
>> interrupt and threaded handler parts, limiting throughput.
>
Hi Ming,
> Frankly speaking, I never observed that single CPU is saturated by one storage
> completion queue's interrupt load. Because CPU is still much quicker than
> current storage device.
>
> If there are more drives, one CPU won't handle more than one queue(drive)'s
> interrupt if (nr_drive * nr_hw_queues) < nr_cpu_cores.
Are things this simple? I mean, can you guarantee that fio processes are
evenly distributed as such?
>
> So could you describe your case in a bit detail? Then we can confirm
> if this change is really needed.
The issue is that the CPU is saturated in servicing the hard and
threaded part of the interrupt together - here's the sort of thing which
we saw previously:
Before:
CPU %usr %sys %irq %soft %idle
all 2.9 13.1 1.2 4.6 78.2
0 0.0 29.3 10.1 58.6 2.0
1 18.2 39.4 0.0 1.0 41.4
2 0.0 2.0 0.0 0.0 98.0
CPU0 has no effectively no idle.
Then, by allowing the threaded part to roam:
After:
CPU %usr %sys %irq %soft %idle
all 3.5 18.4 2.7 6.8 68.6
0 0.0 20.6 29.9 29.9 19.6
1 0.0 39.8 0.0 50.0 10.2
Note: I think that I may be able to reduce the irq hard part load in the
endpoint driver, but not that much such that we see still this issue.
>
>>
>> For when the interrupt is managed, allow the threaded part to run on all
>> cpus in the irq affinity mask.
>
> I remembered that performance drop is observed by this approach in some
> test.
From checking the thread about the NVMe interrupt swamp, just switching
to threaded handler alone degrades performance. I didn't see any
specific results for this change from Long Li -
https://lkml.org/lkml/2019/8/21/128
Thanks,
John
Powered by blists - more mailing lists