[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <1d32a220-d941-6e67-58db-3e949d599812@126.com>
Date: Tue, 16 May 2023 21:58:57 +0800
From: Jiwei Sun <jiweisun126@....com>
To: Keith Busch <kbusch@...nel.org>
Cc: linux-nvme@...ts.infradead.org, linux-kernel@...r.kernel.org,
axboe@...com, hch@....de, sagi@...mberg.me, ahuang12@...ovo.com,
sunjw10@...ovo.com
Subject: Re: [PATCH] nvme: add cond_resched() to nvme_complete_batch()
Hi Keith,
On 2023/5/16 04:40, Keith Busch wrote:
> On Tue, May 02, 2023 at 08:54:12PM +0800, jiweisun126@....com wrote:
>> From: Jiwei Sun <sunjw10@...ovo.com>
>>
>> A soft lockup issue will be triggered when do fio test on a 448-core
>> server, such as the following warning:
> ...
>
>> According to the above two logs, we can know the nvme_irq() cost too much
>> time, in the above case, about 4.8 second. And we can also know that the
>> main bottlenecks is in the competition for the spin lock pool->lock.
> The most recent 6.4-rc has included a significant changeset to the pool
> allocator that may show a considerable difference in pool->lock timing.
> It would be interesting to hear if it changes your observation with your
> 448-core setup. Would you be able to re-run your experiements that
> produced the soft lockup with this kernel on that machine?
We have done some testes with the latest kernel, the issue can not be
reproduced,
and we have analyzed the ftrace log of nvme_irq, we did NOT find any
competition for
the spin lock pool->lock, and all the dma_pool_free function completed
within 2us.
287) | dma_pool_free() {
287) 0.150 us | _raw_spin_lock_irqsave();
287) 0.421 us | _raw_spin_unlock_irqrestore();
287) 1.472 us | }
+-- 63 lines: 287) | mempool_free() {-----------
435) | dma_pool_free() {
435) 0.170 us | _raw_spin_lock_irqsave();
435) 0.210 us | _raw_spin_unlock_irqrestore();
435) 1.172 us | }
+--145 lines: 435) | mempool_free() {---------
317) | dma_pool_free() {
317) 0.160 us | _raw_spin_lock_irqsave();
317) 0.401 us | _raw_spin_unlock_irqrestore();
317) 1.252 us | }
Based on the test results and analysis of the code principles, your
patch has fixed this performance issue.
By the way, another task hung issue was triggered in the test. We are
analyzing it, but this is another story,
we can discuss it in other thread.
Thanks,
Regards,
Jiwei
Powered by blists - more mailing lists