linux-kernel - Re: [patch 1/4] sched/mmcid: Prevent live lock on task to CPU mode transition

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <154690d7-9e9f-4d36-a89c-7ed1a57c42ae@efficios.com>
Date: Fri, 30 Jan 2026 10:24:44 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Thomas Gleixner <tglx@...nel.org>, LKML <linux-kernel@...r.kernel.org>
Cc: Ihor Solodrai <ihor.solodrai@...ux.dev>,
 Shrikanth Hegde <sshegde@...ux.ibm.com>,
 Peter Zijlstra <peterz@...radead.org>,
 Michael Jeanson <mjeanson@...icios.com>
Subject: Re: [patch 1/4] sched/mmcid: Prevent live lock on task to CPU mode
 transition

On 2026-01-29 16:20, Thomas Gleixner wrote:
> Ihor reported a BPF CI failure which turned out to be a live lock in the
> MM_CID management. The scenario is:
> 
> A test program creates the 4th child, which means the MM_CID users become

It would be clearer to talk in terms of threads, e.g. "creates the 5th
thread". AFAIR threads are "siblings", so I'm not sure that the
parent/child relationship really applies here.

> more than the number of CPUs (four in this example), so it switches to per
> CPU ownership mode.
> 
> At this point each live task of the program has a CID associated. Assume
> thread creation order assignment for simplicity.
> 
>     T0 (main thread)       CID0  runs fork() and creates T4
>     T1 (1st child)	  CID1

2nd thread and so on...

>     T2 (2nd child)	  CID2
>     T3 (3rd child)	  CID3
>     T4 (4th child)         ---   not visible yet
> 
> T0 sets mm_cid::percpu = true and transfers it's own CID to CPU0 where it

its

> runs on and then starts the fixup which walks through the threads to
> transfer the per task CIDs either to the CPU the task is running on or drop
> it back into the pool if the task is not on a CPU.
> 
> During that T1 - T3 are free to schedule in and out before the fixup caught
> up with them. Going through all possible permutations with a python script
> revealed a few problematic cases. The most trivial one is:
> 
>     T1 schedules in on CPU1 and observes percpu == true, so it transfers
>        it's CID to CPU1

its

> 
>     T1 is migrated to CPU1 and schedule in observes percpu == true, but

I think you mean "to CPU2" here.

>        CPU2 does not have a CID associated and T1 transferred it's own to

its

[...]
> + *
> + * Aside of that this mechanism also ensures RT compability:

compatibility

[...]
> @@ -10596,11 +10628,13 @@ void sched_mm_cid_fork(struct task_struc
>   		if (!percpu)
>   			mm_cid_transit_to_task(current, pcp);
>   		else
> -			mm_cid_transfer_to_cpu(current, pcp);
> +			mm_cid_transit_to_cpu(current, pcp);
>   	}
>   
>   	if (percpu) {
>   		mm_cid_fixup_tasks_to_cpus();
> +		/* Clear the transition bit */
> +		WRITE_ONCE(mm->mm_cid.transit, 0);

You should move this WRITE_ONCE at the end of
mm_cid_fixup_tasks_to_cpus() to keep the same pattern as for
mm_cid_fixup_cpus_to_tasks().

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com