[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20250929133602.32462-1-piliu@redhat.com>
Date: Mon, 29 Sep 2025 21:36:02 +0800
From: Pingfan Liu <piliu@...hat.com>
To: linux-kernel@...r.kernel.org
Cc: Pingfan Liu <piliu@...hat.com>,
Ingo Molnar <mingo@...hat.com>,
Peter Zijlstra <peterz@...radead.org>,
Juri Lelli <juri.lelli@...hat.com>,
Vincent Guittot <vincent.guittot@...aro.org>,
Dietmar Eggemann <dietmar.eggemann@....com>,
Steven Rostedt <rostedt@...dmis.org>,
Ben Segall <bsegall@...gle.com>,
Mel Gorman <mgorman@...e.de>,
Valentin Schneider <vschneid@...hat.com>
Subject: [PATCH] sched/deadline: Derive root domain from active cpu in task's cpus_ptr
When testing kexec-reboot on a 144 cpus machine with
isolcpus=managed_irq,domain,1-71,73-143 in kernel command line, I
encounter the following bug:
[ 97.114759] psci: CPU142 killed (polled 0 ms)
[ 97.333236] Failed to offline CPU143 - error=-16
[ 97.333246] ------------[ cut here ]------------
[ 97.342682] kernel BUG at kernel/cpu.c:1569!
[ 97.347049] Internal error: Oops - BUG: 00000000f2000800 [#1] SMP
[ 97.353281] Modules linked in: rfkill sunrpc dax_hmem cxl_acpi cxl_port cxl_core einj vfat fat arm_smmuv3_pmu nvidia_cspmu arm_spe_pmu coresight_trbe arm_cspmu_module rndis_host ipmi_ssif cdc_ether i2c_smbus spi_nor usbnet ast coresight_tmc mii ixgbe i2c_algo_bit mdio mtd coresight_funnel coresight_stm stm_core coresight_etm4x coresight cppc_cpufreq loop fuse nfnetlink xfs crct10dif_ce ghash_ce sha2_ce sha256_arm64 sha1_ce sbsa_gwdt nvme nvme_core nvme_auth i2c_tegra acpi_power_meter acpi_ipmi ipmi_devintf ipmi_msghandler dm_mirror dm_region_hash dm_log dm_mod
[ 97.404119] CPU: 0 UID: 0 PID: 2583 Comm: kexec Kdump: loaded Not tainted 6.12.0-41.el10.aarch64 #1
[ 97.413371] Hardware name: Supermicro MBD-G1SMH/G1SMH, BIOS 2.0 07/12/2024
[ 97.420400] pstate: 23400009 (nzCv daif +PAN -UAO +TCO +DIT -SSBS BTYPE=--)
[ 97.427518] pc : smp_shutdown_nonboot_cpus+0x104/0x128
[ 97.432778] lr : smp_shutdown_nonboot_cpus+0x11c/0x128
[ 97.438028] sp : ffff800097c6b9a0
[ 97.441411] x29: ffff800097c6b9a0 x28: ffff0000a099d800 x27: 0000000000000000
[ 97.448708] x26: 0000000000000000 x25: 0000000000000000 x24: ffffb94aaaa8f218
[ 97.456004] x23: ffffb94aaaabaae0 x22: ffffb94aaaa8f018 x21: 0000000000000000
[ 97.463301] x20: ffffb94aaaa8fc10 x19: 000000000000008f x18: 00000000fffffffe
[ 97.470598] x17: 0000000000000000 x16: ffffb94aa958fcd0 x15: ffff103acfca0b64
[ 97.477894] x14: ffff800097c6b520 x13: 36312d3d726f7272 x12: ffff103acfc6ffa8
[ 97.485191] x11: ffff103acf6f0000 x10: ffff103bc085c400 x9 : ffffb94aa88a0eb0
[ 97.492488] x8 : 0000000000000001 x7 : 000000000017ffe8 x6 : c0000000fffeffff
[ 97.499784] x5 : ffff003bdf62b408 x4 : 0000000000000000 x3 : 0000000000000000
[ 97.507081] x2 : 0000000000000000 x1 : ffff0000a099d800 x0 : 0000000000000002
[ 97.514379] Call trace:
[ 97.516874] smp_shutdown_nonboot_cpus+0x104/0x128
[ 97.521769] machine_shutdown+0x20/0x38
[ 97.525693] kernel_kexec+0xc4/0xf0
[ 97.529260] __do_sys_reboot+0x24c/0x278
[ 97.533272] __arm64_sys_reboot+0x2c/0x40
[ 97.537370] invoke_syscall.constprop.0+0x74/0xd0
[ 97.542179] do_el0_svc+0xb0/0xe8
[ 97.545562] el0_svc+0x44/0x1d0
[ 97.548772] el0t_64_sync_handler+0x120/0x130
[ 97.553222] el0t_64_sync+0x1a4/0x1a8
[ 97.556963] Code: a94363f7 a8c47bfd d50323bf d65f03c0 (d4210000)
[ 97.563191] ---[ end trace 0000000000000000 ]---
[ 97.595854] Kernel panic - not syncing: Oops - BUG: Fatal exception
[ 97.602275] Kernel Offset: 0x394a28600000 from 0xffff800080000000
[ 97.608502] PHYS_OFFSET: 0x80000000
[ 97.612062] CPU features: 0x10,0000000d,002a6928,5667fea7
[ 97.617580] Memory Limit: none
[ 97.648626] ---[ end Kernel panic - not syncing: Oops - BUG: Fatal exception ]
Tracking down this issue, I found that dl_bw_deactivate() returned
-EBUSY, which caused sched_cpu_deactivate() to fail on the last CPU.
When a CPU is inactive, its rd is set to def_root_domain. For an S-state
deadline task (in this case, "cppc_fie"), it was not migrated to CPU0,
and its task_rq() information is stale. As a result, its bandwidth is
wrongly accounted into def_root_domain during domain rebuild.
This patch uses the rd from the run queue of still-active CPU to get the
correct root domain.
Signed-off-by: Pingfan Liu <piliu@...hat.com>
Cc: Ingo Molnar <mingo@...hat.com>
Cc: Peter Zijlstra <peterz@...radead.org>
Cc: Juri Lelli <juri.lelli@...hat.com>
Cc: Vincent Guittot <vincent.guittot@...aro.org>
Cc: Dietmar Eggemann <dietmar.eggemann@....com>
Cc: Steven Rostedt <rostedt@...dmis.org>
Cc: Ben Segall <bsegall@...gle.com>
Cc: Mel Gorman <mgorman@...e.de>
Cc: Valentin Schneider <vschneid@...hat.com>
To: linux-kernel@...r.kernel.org
---
kernel/sched/deadline.c | 20 ++++++++++++++------
1 file changed, 14 insertions(+), 6 deletions(-)
diff --git a/kernel/sched/deadline.c b/kernel/sched/deadline.c
index f25301267e47..bb42b82d6366 100644
--- a/kernel/sched/deadline.c
+++ b/kernel/sched/deadline.c
@@ -2913,6 +2913,7 @@ void dl_add_task_root_domain(struct task_struct *p)
struct rq_flags rf;
struct rq *rq;
struct dl_bw *dl_b;
+ unsigned int cpu;
raw_spin_lock_irqsave(&p->pi_lock, rf.flags);
if (!dl_task(p) || dl_entity_is_special(&p->dl)) {
@@ -2920,16 +2921,23 @@ void dl_add_task_root_domain(struct task_struct *p)
return;
}
- rq = __task_rq_lock(p, &rf);
-
+ lockdep_assert_cpus_held();
+ /*
+ * If @p is not in R state, task_cpu() may be not active. task_rq()'s
+ * root_domain may be invalid. But the rest active cpus on cpus_ptr
+ * share the same root domain.
+ */
+ cpu = cpumask_first_and(cpu_active_mask, p->cpus_ptr);
+ rq = cpu_rq(cpu);
+ /*
+ * This point is under the protection of cpu_hotplug_lock. Hence
+ * rq->rd is stable.
+ */
dl_b = &rq->rd->dl_bw;
raw_spin_lock(&dl_b->lock);
-
__dl_add(dl_b, p->dl.dl_bw, cpumask_weight(rq->rd->span));
-
raw_spin_unlock(&dl_b->lock);
-
- task_rq_unlock(rq, p, &rf);
+ raw_spin_unlock_irqrestore(&p->pi_lock, rf.flags);
}
void dl_clear_root_domain(struct root_domain *rd)
--
2.49.0
Powered by blists - more mailing lists