lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Tue, 25 Jun 2024 19:03:11 +0100
From: Robin Murphy <robin.murphy@....com>
To: "zhangzekun (A)" <zhangzekun11@...wei.com>
Cc: iommu@...ts.linux.dev, linux-kernel@...r.kernel.org,
 "joro@...tes.org" <joro@...tes.org>, will@...nel.org,
 John Garry <john.g.garry@...cle.com>
Subject: Re: [PATCH] iommu/iova: Bettering utilizing cpu_rcaches in no-strict
 mode

On 2024-06-25 2:29 am, zhangzekun (A) wrote:
> 
> 
> 在 2024/6/24 21:32, Robin Murphy 写道:
> 
>>> This patch is firstly intent to minimize the chance of softlock issue 
>>> in fq_flush_timeout(), which is already dicribed erarlier in [1], 
>>> which has beed applied in a commercial kernel[2] for years.
>>>
>>> However, the later tests show that this single patch is not enough to 
>>> fix the softlockup issue, since the root cause of softlockup is the 
>>> underlying iova_rbtree_lock. In our softlockup scenarios, the average
>>> time cost to get this spinlock is about 6ms.
>>
>> That should already be fixed, though. The only reason for 
>> fq_flush_timeout() to interact with the rbtree at all was due to the 
>> notion of a fixed-size depot which could become full. That no longer 
>> exists since 911aa1245da8 ("iommu/iova: Make the rcache depot scale 
>> better").
>>
>> Thanks,
>> Robin.
>>
> Hi, Robin,
> 
> The commit 911aa1245da8 ("iommu/iova: Make the rcache depot scale 
> better") can reduce the risks of softlockup, but can not fix it 
> entirely. We do solve a softlockup issue[1] with that patch, and that is
> why it has aleady been backported in our branch. The softlockup issue 
> which we met recently is a 5.10-based kernel with that patch already 
> backported, which can be found in [2].

Sorry, I was implying some context that I should have made clear - yes, 
the softlockup can still happen in general if the flush queues are full 
of IOVAs which are too large for the rcache mechanism at all, so are 
always freed directly to the rbtree, but then there's no way *this* 
patch could make any difference to that case either.

Thanks,
Robin.

> In 6.7-rc4, which has already applied patch  911aa1245da8 ("iommu/iova: 
> Make the rcache depot scale better"). We can also get the following
> call trace. It seems still existing the issue, and we fix this issue 
> using the patch in [3], when the call trace first came up.
> 
> rcu: INFO: rcu_preempt self-detected stall on CPU
> rcu:     85-....: (5249 ticks this GP) idle=4d04/1/0x4000000000000000 
> softirq=428240/428240 fqs=1882
> rcu:     (t=5253 jiffies g=2005741 q=19446136 ncpus=128)
> CPU: 85 PID: 2401130 Comm: kpktgend_85 Kdump: loaded Tainted: G  O       
> 6.7.0-rc4+ #1
> 
> pc : _raw_spin_unlock_irqrestore+0x14/0x78
> lr : fq_flush_timeout+0x94/0x118
> sp : ffff8000802abdf0
> x29: ffff8000802abdf0 x28: ffff20201142e480 x27: 0000000000000002
> x26: ffffdeeb67e82b20 x25: 0000000000000000 x24: ffffdeeb67e87400
> x23: ffffdeeb67e82ba0 x22: ffff202009db1c00 x21: ffffdeeb67e82f40
> x20: 0000000000000000 x19: ffffdbe06f0140a8 x18: 0000000000040000
> x17: 0000000000000001 x16: 0000000000000040 x15: ffffffffffffffff
> x14: 0000000000001fff x13: 000000000003ffff x12: 0000000000000259
> x11: 000000000007ffff x10: 000000000000000d x9 : ffffdeeb6549ec84
> x8 : ffff0020886e0000 x7 : 000000000000964d x6 : 000001afa2986acb
> x5 : 01ffffffffffffff x4 : 0000000000000000 x3 : ffff202006b57d70
> x2 : 0000000000000015 x1 : 0000000000000000 x0 : ffffdbe06f0140a8
> Call trace:
>   _raw_spin_unlock_irqrestore+0x14/0x78
>   call_timer_fn+0x3c/0x1d0
>   expire_timers+0xcc/0x190
>   run_timer_softirq+0xfc/0x268
>   __do_softirq+0x128/0x3dc
>   ____do_softirq+0x18/0x30
>   call_on_irq_stack+0x24/0x30
>   do_softirq_own_stack+0x24/0x38
>   irq_exit_rcu+0xc0/0xe8
>   el1_interrupt+0x48/0xc0
>   el1h_64_irq_handler+0x18/0x28
>   el1h_64_irq+0x78/0x80
>   __schedule+0xf28/0x12a0
>   schedule+0x3c/0x108
>   schedule_timeout+0xa0/0x1d0
>   pktgen_thread_worker+0x1180/0x15d0
>   kthread+0x120/0x130
>   ret_from_fork+0x10/0x20
> 
> [1] https://gitee.com/openeuler/kernel/issues/I8KS9A
> [2] https://gitee.com/openeuler/kernel/tree/OLK-5.10/
> [3] https://lkml.org/lkml/2023/2/16/124
> 
> Thanks,
> Zekun

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ