linux-kernel - Re: [patch V3 17/20] sched/mmcid: Provide CID ownership mode fixup functions

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <87wm4brp00.ffs@tglx>
Date: Fri, 31 Oct 2025 17:54:07 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, LKML
 <linux-kernel@...r.kernel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Gabriele Monaco
 <gmonaco@...hat.com>, Michael Jeanson <mjeanson@...icios.com>, Jens Axboe
 <axboe@...nel.dk>, "Paul E. McKenney" <paulmck@...nel.org>, "Gautham R.
 Shenoy" <gautham.shenoy@....com>, Florian Weimer <fweimer@...hat.com>, Tim
 Chen <tim.c.chen@...el.com>, Yury Norov <yury.norov@...il.com>, Shrikanth
 Hegde <sshegde@...ux.ibm.com>
Subject: Re: [patch V3 17/20] sched/mmcid: Provide CID ownership mode fixup
 functions

On Thu, Oct 30 2025 at 11:51, Mathieu Desnoyers wrote:
> On 2025-10-29 09:09, Thomas Gleixner wrote:
>> At the point of switching to per CPU mode the new user is not yet visible
>> in the system, so the task which initiated the fork() runs the fixup
>> function: mm_cid_fixup_tasks_to_cpu() walks the thread list and either
>> transfers each tasks owned CID to the CPU the task runs on or drops it into
>> the CID pool if a task is not on a CPU at that point in time. Tasks which
>> schedule in before the task walk reaches them do the handover in
>> mm_cid_schedin(). When mm_cid_fixup_tasks_to_cpus() completes it's
>> guaranteed that no task related to that MM owns a CID anymore.
>> 
>> Switching back to task mode happens when the user count goes below the
>> threshold which was recorded on the per CPU mode switch:
>> 
>> 	pcpu_thrs = min(opt_cids - (opt_cids / 4), nr_cpu_ids / 2);
>> 
>
> AFAIU this provides an hysteresis so we don't switch back and
> forth between modes if a single thread is forked/exits repeatedly,
> right ?

Yes. We could do that with a timer too, but the hysteresis worked fine
so far.

>> This transition from CPU to per task ownership happens in two phases:
>> 
>>   1) mm:mm_cid.transit contains MM_CID_TRANSIT. This is OR'ed on the task
>>      CID and denotes that the CID is only temporarily owned by the
>>      task. When it schedules out the task drops the CID back into the
>>      pool if this bit is set.
>
> OK, so the mm_drop_cid() on sched out only happens due to a transition
> from per-cpu back to per-task. This answers my question in the previous
> patch.

:)

>> + * Switching back to task mode happens when the user count goes below the
>> + * threshold which was recorded on the per CPU mode switch:
>> + *
>> + *	pcpu_thrs = min(opt_cids - (opt_cids / 4), num_possible_cpus() / 2);
>
> I notice that mm_update_cpus_allowed() calls __mm_update_max_cids() 
> before updating the pcpu_thrs threshold.
>
> sched_mm_cid_{add,remove}_user() only invoke mm_update_max_cids(mm)
> without updating pcpu_thrs first.
>
> Are those done on purpose ?

Yes. Update of pcpu_thrs is only possible when a resulting transition
can be handled in the context. max_cids update is always possible.

That's why mm_update_cpus_allowed() only updates max_cids and then
schedules work to defer a potential transition to the worker thread
context.

sched_mm_cid_{add,remove}_user() does:

    mm_update_max_cids()
      __mm_update_max_cids()    <- Updates max_cids

      update threshold and potentially switch ownership mode

As this holds the mutex it prevents new tasks coming in or other tasks
exiting until it managed the transition.

mm_cid_work_fn() does the same thing unless a
sched_mm_cid_{add,remove}_user() did not already handle it.

Thanks,

        tglx