linux-kernel - Re: [PATCH v2 01/12] sched/psi: Optimize psi_group_change() cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite for Android: free password hash cracker in your pocket

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <b49ef6cd-78b4-49ee-95e8-3105ac54e7d1@amd.com>
Date: Wed, 16 Jul 2025 11:36:30 +0530
From: K Prateek Nayak <kprateek.nayak@....com>
To: Chris Mason <clm@...a.com>, Peter Zijlstra <peterz@...radead.org>,
 mingo@...hat.com, juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, vschneid@...hat.com
Cc: linux-kernel@...r.kernel.org, Johannes Weiner <hannes@...xchg.org>,
 Ayush.jain3@....com, Srikanth.Aithal@....com
Subject: Re: [PATCH v2 01/12] sched/psi: Optimize psi_group_change()
 cpu_clock() usage

(+ Ayush, Srikanth)

Hello Chris,

On 7/16/2025 12:41 AM, Chris Mason wrote:
>> @@ -186,7 +208,7 @@ static void group_init(struct psi_group
>>  
>>  	group->enabled = true;
>>  	for_each_possible_cpu(cpu)
>> -		seqcount_init(&per_cpu_ptr(group->pcpu, cpu)->seq);
>> +		seqcount_init(per_cpu_ptr(&psi_seq, cpu));
>>  	group->avg_last_update = sched_clock();
>>  	group->avg_next_update = group->avg_last_update + psi_period;
>>  	mutex_init(&group->avgs_lock);
> 
> I'm not sure if someone mentioned this already, but testing the
> series I got a bunch of softlockups in get_recent_times()
> that randomly jumped from CPU to CPU.

Ayush, Srikanth, and I ran into this yesterday when testing different
trees (next, queue:sched/core) with similar signatures:

    watchdog: BUG: soft lockup - CPU#0 stuck for 21s! [kworker/0:3:2140]
    Modules linked in: ...
    CPU: 0 UID: 0 PID: 2140 Comm: kworker/0:3 Not tainted 6.16.0-rc1-test+ #20 PREEMPT(voluntary)
    Hardware name: Dell Inc. PowerEdge R6525/024PW1, BIOS 2.7.3 03/30/2022
    Workqueue: events psi_avgs_work
    RIP: 0010:collect_percpu_times+0x3a0/0x670
    Code: 65 48 2b 05 4a 79 d2 02 0f 85 dc 02 00 00 48 8d 65 d8 5b 41 5c 41 5d 41 5e 41 5f 5d e9 34 ba d2 ff f3 90 49 81 ff ff 1f 00 00 <0f> 86 73 fd ff ff 4c 89 fe 48 c7 c7 80 9d 29 bb e8 cb 92 73 00 e9
    RSP: 0018:ffffcda753383d10 EFLAGS: 00000297
    RAX: ffff8be86fadcd40 RBX: ffffeda7308d4580 RCX: 000000000000006b
    RDX: 000000000000002b RSI: 0000000000000100 RDI: ffffffffbab3f400
    RBP: ffffcda753383e30 R08: 000000000000006b R09: 0000000000000000
    R10: 0000008cca6be372 R11: 0000000000000006 R12: 000000000000006b
    R13: ffffeda7308d4594 R14: 00000000000037e5 R15: 000000000000006b
    FS:  0000000000000000(0000) GS:ffff8ba8c1118000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
    CR2: 000055b3cf990c3c CR3: 000000807dc40006 CR4: 0000000000f70ef0
    PKRU: 55555554
    Call Trace:
     <TASK>
     ? srso_alias_return_thunk+0x5/0xfbef5
     ? psi_group_change+0x1ff/0x460
     ? add_timer_on+0x10a/0x160
     psi_avgs_work+0x4c/0xd0
     ? queue_delayed_work_on+0x6d/0x80
     process_one_work+0x193/0x3c0
     worker_thread+0x29d/0x3c0
     ? __pfx_worker_thread+0x10/0x10
     kthread+0xff/0x210
     ? __pfx_kthread+0x10/0x10
     ? __pfx_kthread+0x10/0x10
     ret_from_fork+0x171/0x1a0
     ? __pfx_kthread+0x10/0x10
     ret_from_fork_asm+0x1a/0x30
     </TASK>

I was able to reproduce this reliably running 100 copies of an infinite
loop doing - cgroup create, move a task, move task back to root, remove
cgroup - alongside hackbench running in a seperate cgroup and I hit this
in ~5-10min.

I have been running the same test with your fix and haven't run into
this for over 30min now. Feel free to add:

Reviewed-and-tested-by: K Prateek Nayak <kprateek.nayak@....com>

> 
> This fixed it for me, but reading it now I'm wondering
> if we want to seqcount_init() unconditionally even when PSI
> is off.

Looking at "psi_enable", it can only be toggled via the kernel
parameter "psi=" and I don't see anything that does a
"static_branch_disable(&psi_disabled)" at runtime so I think
your fix should be good.

> 
> diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
> index 2024c1d36402d..979a447bc281f 100644
> --- a/kernel/sched/psi.c
> +++ b/kernel/sched/psi.c
> @@ -207,8 +207,6 @@ static void group_init(struct psi_group *group)
>         int cpu;

"cpu" variable can be removed too from group_init() now.

> 
>         group->enabled = true;
> -       for_each_possible_cpu(cpu)
> -               seqcount_init(per_cpu_ptr(&psi_seq, cpu));
>         group->avg_last_update = sched_clock();
>         group->avg_next_update = group->avg_last_update + psi_period;
>         mutex_init(&group->avgs_lock);
> @@ -231,6 +229,7 @@ static void group_init(struct psi_group *group)
> 
>  void __init psi_init(void)
>  {
> +       int cpu;

nit. newline after declaration.

>         if (!psi_enable) {
>                 static_branch_enable(&psi_disabled);
>                 static_branch_disable(&psi_cgroups_enabled);
> @@ -241,6 +240,8 @@ void __init psi_init(void)
>                 static_branch_disable(&psi_cgroups_enabled);
> 
>         psi_period = jiffies_to_nsecs(PSI_FREQ);
> +       for_each_possible_cpu(cpu)
> +               seqcount_init(per_cpu_ptr(&psi_seq, cpu));
>         group_init(&psi_system);
>  }
> 
> 

-- 
Thanks and Regards,
Prateek