linux-kernel - Re: [PATCH] iommu/iova: Bettering utilizing cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <c6ae28eb-69b6-4508-a516-1d419e950b37@huawei.com>
Date: Tue, 25 Jun 2024 09:29:58 +0800
From: "zhangzekun (A)" <zhangzekun11@...wei.com>
To: Robin Murphy <robin.murphy@....com>
CC: <iommu@...ts.linux.dev>, <linux-kernel@...r.kernel.org>, "joro@...tes.org"
	<joro@...tes.org>, <will@...nel.org>, John Garry <john.g.garry@...cle.com>
Subject: Re: [PATCH] iommu/iova: Bettering utilizing cpu_rcaches in no-strict
 mode



在 2024/6/24 21:32, Robin Murphy 写道:

>> This patch is firstly intent to minimize the chance of softlock issue 
>> in fq_flush_timeout(), which is already dicribed erarlier in [1], 
>> which has beed applied in a commercial kernel[2] for years.
>>
>> However, the later tests show that this single patch is not enough to 
>> fix the softlockup issue, since the root cause of softlockup is the 
>> underlying iova_rbtree_lock. In our softlockup scenarios, the average
>> time cost to get this spinlock is about 6ms.
> 
> That should already be fixed, though. The only reason for 
> fq_flush_timeout() to interact with the rbtree at all was due to the 
> notion of a fixed-size depot which could become full. That no longer 
> exists since 911aa1245da8 ("iommu/iova: Make the rcache depot scale 
> better").
> 
> Thanks,
> Robin.
> 
Hi, Robin,

The commit 911aa1245da8 ("iommu/iova: Make the rcache depot scale 
better") can reduce the risks of softlockup, but can not fix it 
entirely. We do solve a softlockup issue[1] with that patch, and that is
why it has aleady been backported in our branch. The softlockup issue 
which we met recently is a 5.10-based kernel with that patch already 
backported, which can be found in [2].

In 6.7-rc4, which has already applied patch  911aa1245da8 ("iommu/iova: 
Make the rcache depot scale better"). We can also get the following
call trace. It seems still existing the issue, and we fix this issue 
using the patch in [3], when the call trace first came up.

rcu: INFO: rcu_preempt self-detected stall on CPU
rcu: 	85-....: (5249 ticks this GP) idle=4d04/1/0x4000000000000000 
softirq=428240/428240 fqs=1882
rcu: 	(t=5253 jiffies g=2005741 q=19446136 ncpus=128)
CPU: 85 PID: 2401130 Comm: kpktgend_85 Kdump: loaded Tainted: G 
  O       6.7.0-rc4+ #1

pc : _raw_spin_unlock_irqrestore+0x14/0x78
lr : fq_flush_timeout+0x94/0x118
sp : ffff8000802abdf0
x29: ffff8000802abdf0 x28: ffff20201142e480 x27: 0000000000000002
x26: ffffdeeb67e82b20 x25: 0000000000000000 x24: ffffdeeb67e87400
x23: ffffdeeb67e82ba0 x22: ffff202009db1c00 x21: ffffdeeb67e82f40
x20: 0000000000000000 x19: ffffdbe06f0140a8 x18: 0000000000040000
x17: 0000000000000001 x16: 0000000000000040 x15: ffffffffffffffff
x14: 0000000000001fff x13: 000000000003ffff x12: 0000000000000259
x11: 000000000007ffff x10: 000000000000000d x9 : ffffdeeb6549ec84
x8 : ffff0020886e0000 x7 : 000000000000964d x6 : 000001afa2986acb
x5 : 01ffffffffffffff x4 : 0000000000000000 x3 : ffff202006b57d70
x2 : 0000000000000015 x1 : 0000000000000000 x0 : ffffdbe06f0140a8
Call trace:
  _raw_spin_unlock_irqrestore+0x14/0x78
  call_timer_fn+0x3c/0x1d0
  expire_timers+0xcc/0x190
  run_timer_softirq+0xfc/0x268
  __do_softirq+0x128/0x3dc
  ____do_softirq+0x18/0x30
  call_on_irq_stack+0x24/0x30
  do_softirq_own_stack+0x24/0x38
  irq_exit_rcu+0xc0/0xe8
  el1_interrupt+0x48/0xc0
  el1h_64_irq_handler+0x18/0x28
  el1h_64_irq+0x78/0x80
  __schedule+0xf28/0x12a0
  schedule+0x3c/0x108
  schedule_timeout+0xa0/0x1d0
  pktgen_thread_worker+0x1180/0x15d0
  kthread+0x120/0x130
  ret_from_fork+0x10/0x20

[1] https://gitee.com/openeuler/kernel/issues/I8KS9A
[2] https://gitee.com/openeuler/kernel/tree/OLK-5.10/
[3] https://lkml.org/lkml/2023/2/16/124

Thanks,
Zekun