[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b49ef6cd-78b4-49ee-95e8-3105ac54e7d1@amd.com>
Date: Wed, 16 Jul 2025 11:36:30 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Chris Mason <clm@...a.com>, Peter Zijlstra <peterz@...radead.org>,
mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, vschneid@...hat.com
Cc: linux-kernel@...r.kernel.org, Johannes Weiner <hannes@...xchg.org>,
Ayush.jain3@....com, Srikanth.Aithal@....com
Subject: Re: [PATCH v2 01/12] sched/psi: Optimize psi_group_change()
cpu_clock() usage
(+ Ayush, Srikanth)
Hello Chris,
On 7/16/2025 12:41 AM, Chris Mason wrote:
>> @@ -186,7 +208,7 @@ static void group_init(struct psi_group
>>
>> group->enabled = true;
>> for_each_possible_cpu(cpu)
>> - seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq);
>> + seqcount_init(per_cpu_ptr(&psi_seq, cpu));
>> group->avg_last_update = sched_clock();
>> group->avg_next_update = group->avg_last_update + psi_period;
>> mutex_init(&group->avgs_lock);
>
> I'm not sure if someone mentioned this already, but testing the
> series I got a bunch of softlockups in get_recent_times()
> that randomly jumped from CPU to CPU.
Ayush, Srikanth, and I ran into this yesterday when testing different
trees (next, queue:sched/core) with similar signatures:
watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [kworker/0:3:2140]
Modules linked in: ...
CPU: 0 UID: 0 PID: 2140 Comm: kworker/0:3 Not tainted 6.16.0-rc1-test+ #20 PREEMPT(voluntary)
Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.7.3 03/30/2022
Workqueue: events psi_avgs_work
RIP: 0010:collect_percpu_times+0x3a0/0x670
Code: 65 48 2b 05 4a 79 d2 02 0f 85 dc 02 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d e9 34 ba d2 ff f3 90 49 81 ff ff 1f 00 00 <0f> 86 73 fd ff ff 4c 89 fe 48 c7 c7 80 9d 29 bb e8 cb 92 73 00 e9
RSP: 0018:ffffcda753383d10 EFLAGS: 00000297
RAX: ffff8be86fadcd40 RBX: ffffeda7308d4580 RCX: 000000000000006b
RDX: 000000000000002b RSI: 0000000000000100 RDI: ffffffffbab3f400
RBP: ffffcda753383e30 R08: 000000000000006b R09: 0000000000000000
R10: 0000008cca6be372 R11: 0000000000000006 R12: 000000000000006b
R13: ffffeda7308d4594 R14: 00000000000037e5 R15: 000000000000006b
FS: 0000000000000000(0000) GS:ffff8ba8c1118000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 000055b3cf990c3c CR3: 000000807dc40006 CR4: 0000000000f70ef0
PKRU: 55555554
Call Trace:
<TASK>
? srso_alias_return_thunk+0x5/0xfbef5
? psi_group_change+0x1ff/0x460
? add_timer_on+0x10a/0x160
psi_avgs_work+0x4c/0xd0
? queue_delayed_work_on+0x6d/0x80
process_one_work+0x193/0x3c0
worker_thread+0x29d/0x3c0
? __pfx_worker_thread+0x10/0x10
kthread+0xff/0x210
? __pfx_kthread+0x10/0x10
? __pfx_kthread+0x10/0x10
ret_from_fork+0x171/0x1a0
? __pfx_kthread+0x10/0x10
ret_from_fork_asm+0x1a/0x30
</TASK>
I was able to reproduce this reliably running 100 copies of an infinite
loop doing - cgroup create, move a task, move task back to root, remove
cgroup - alongside hackbench running in a seperate cgroup and I hit this
in ~5-10min.
I have been running the same test with your fix and haven't run into
this for over 30min now. Feel free to add:
Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@....com>
>
> This fixed it for me, but reading it now I'm wondering
> if we want to seqcount_init() unconditionally even when PSI
> is off.
Looking at "psi_enable", it can only be toggled via the kernel
parameter "psi=" and I don't see anything that does a
"static_branch_disable(&psi_disabled)" at runtime so I think
your fix should be good.
>
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 2024c1d36402d..979a447bc281f 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -207,8 +207,6 @@ static void group_init(struct psi_group *group)
> int cpu;
"cpu" variable can be removed too from group_init() now.
>
> group->enabled = true;
> - for_each_possible_cpu(cpu)
> - seqcount_init(per_cpu_ptr(&psi_seq, cpu));
> group->avg_last_update = sched_clock();
> group->avg_next_update = group->avg_last_update + psi_period;
> mutex_init(&group->avgs_lock);
> @@ -231,6 +229,7 @@ static void group_init(struct psi_group *group)
>
> void __init psi_init(void)
> {
> + int cpu;
nit. newline after declaration.
> if (!psi_enable) {
> static_branch_enable(&psi_disabled);
> static_branch_disable(&psi_cgroups_enabled);
> @@ -241,6 +240,8 @@ void __init psi_init(void)
> static_branch_disable(&psi_cgroups_enabled);
>
> psi_period = jiffies_to_nsecs(PSI_FREQ);
> + for_each_possible_cpu(cpu)
> + seqcount_init(per_cpu_ptr(&psi_seq, cpu));
> group_init(&psi_system);
> }
>
>
--
Thanks and Regards,
Prateek
Powered by blists - more mailing lists