[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87wm4brp00.ffs@tglx>
Date: Fri, 31 Oct 2025 17:54:07 +0100
From: Thomas Gleixner <tglx@...utronix.de>
To: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>, LKML
<linux-kernel@...r.kernel.org>
Cc: Peter Zijlstra <peterz@...radead.org>, Gabriele Monaco
<gmonaco@...hat.com>, Michael Jeanson <mjeanson@...icios.com>, Jens Axboe
<axboe@...nel.dk>, "Paul E. McKenney" <paulmck@...nel.org>, "Gautham R.
Shenoy" <gautham.shenoy@....com>, Florian Weimer <fweimer@...hat.com>, Tim
Chen <tim.c.chen@...el.com>, Yury Norov <yury.norov@...il.com>, Shrikanth
Hegde <sshegde@...ux.ibm.com>
Subject: Re: [patch V3 17/20] sched/mmcid: Provide CID ownership mode fixup
functions
On Thu, Oct 30 2025 at 11:51, Mathieu Desnoyers wrote:
> On 2025-10-29 09:09, Thomas Gleixner wrote:
>> At the point of switching to per CPU mode the new user is not yet visible
>> in the system, so the task which initiated the fork() runs the fixup
>> function: mm_cid_fixup_tasks_to_cpu() walks the thread list and either
>> transfers each tasks owned CID to the CPU the task runs on or drops it into
>> the CID pool if a task is not on a CPU at that point in time. Tasks which
>> schedule in before the task walk reaches them do the handover in
>> mm_cid_schedin(). When mm_cid_fixup_tasks_to_cpus() completes it's
>> guaranteed that no task related to that MM owns a CID anymore.
>>
>> Switching back to task mode happens when the user count goes below the
>> threshold which was recorded on the per CPU mode switch:
>>
>> pcpu_thrs = min(opt_cids - (opt_cids / 4), nr_cpu_ids / 2);
>>
>
> AFAIU this provides an hysteresis so we don't switch back and
> forth between modes if a single thread is forked/exits repeatedly,
> right ?
Yes. We could do that with a timer too, but the hysteresis worked fine
so far.
>> This transition from CPU to per task ownership happens in two phases:
>>
>> 1) mm:mm_cid.transit contains MM_CID_TRANSIT. This is OR'ed on the task
>> CID and denotes that the CID is only temporarily owned by the
>> task. When it schedules out the task drops the CID back into the
>> pool if this bit is set.
>
> OK, so the mm_drop_cid() on sched out only happens due to a transition
> from per-cpu back to per-task. This answers my question in the previous
> patch.
:)
>> + * Switching back to task mode happens when the user count goes below the
>> + * threshold which was recorded on the per CPU mode switch:
>> + *
>> + * pcpu_thrs = min(opt_cids - (opt_cids / 4), num_possible_cpus() / 2);
>
> I notice that mm_update_cpus_allowed() calls __mm_update_max_cids()
> before updating the pcpu_thrs threshold.
>
> sched_mm_cid_{add,remove}_user() only invoke mm_update_max_cids(mm)
> without updating pcpu_thrs first.
>
> Are those done on purpose ?
Yes. Update of pcpu_thrs is only possible when a resulting transition
can be handled in the context. max_cids update is always possible.
That's why mm_update_cpus_allowed() only updates max_cids and then
schedules work to defer a potential transition to the worker thread
context.
sched_mm_cid_{add,remove}_user() does:
mm_update_max_cids()
__mm_update_max_cids() <- Updates max_cids
update threshold and potentially switch ownership mode
As this holds the mutex it prevents new tasks coming in or other tasks
exiting until it managed the transition.
mm_cid_work_fn() does the same thing unless a
sched_mm_cid_{add,remove}_user() did not already handle it.
Thanks,
tglx
Powered by blists - more mailing lists