[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2e4fed9-b207-4d28-93f5-b09f0fe78e35@efficios.com>
Date: Thu, 30 Oct 2025 11:51:03 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
Gabriele Monaco <gmonaco@...hat.com>, Michael Jeanson
<mjeanson@...icios.com>, Jens Axboe <axboe@...nel.dk>,
"Paul E. McKenney" <paulmck@...nel.org>,
"Gautham R. Shenoy" <gautham.shenoy@....com>,
Florian Weimer <fweimer@...hat.com>, Tim Chen <tim.c.chen@...el.com>,
Yury Norov <yury.norov@...il.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>
Subject: Re: [patch V3 17/20] sched/mmcid: Provide CID ownership mode fixup
functions
On 2025-10-29 09:09, Thomas Gleixner wrote:
>
> At the point of switching to per CPU mode the new user is not yet visible
> in the system, so the task which initiated the fork() runs the fixup
> function: mm_cid_fixup_tasks_to_cpu() walks the thread list and either
> transfers each tasks owned CID to the CPU the task runs on or drops it into
> the CID pool if a task is not on a CPU at that point in time. Tasks which
> schedule in before the task walk reaches them do the handover in
> mm_cid_schedin(). When mm_cid_fixup_tasks_to_cpus() completes it's
> guaranteed that no task related to that MM owns a CID anymore.
>
> Switching back to task mode happens when the user count goes below the
> threshold which was recorded on the per CPU mode switch:
>
> pcpu_thrs = min(opt_cids - (opt_cids / 4), nr_cpu_ids / 2);
>
AFAIU this provides an hysteresis so we don't switch back and
forth between modes if a single thread is forked/exits repeatedly,
right ?
> did not cover yet do the handover themself.
themselves
>
> This transition from CPU to per task ownership happens in two phases:
>
> 1) mm:mm_cid.transit contains MM_CID_TRANSIT. This is OR'ed on the task
> CID and denotes that the CID is only temporarily owned by the
> task. When it schedules out the task drops the CID back into the
> pool if this bit is set.
OK, so the mm_drop_cid() on sched out only happens due to a transition
from per-cpu back to per-task. This answers my question in the previous
patch.
>
> 2) The initiating context walks the per CPU space and after completion
> clears mm:mm_cid.transit. After that point the CIDs are strictly
> task owned again.
>
> This two phase transition is required to prevent CID space exhaustion
> during the transition as a direct transfer of ownership would fail if
> two tasks are scheduled in on the same CPU before the fixup freed per
> CPU CIDs.
Clever. :-)
> + * Switching to per CPU mode happens when the user count becomes greater
> + * than the maximum number of CIDs, which is calculated by:
> + *
> + * opt_cids = min(mm_cid::nr_cpus_allowed, mm_cid::users);
> + * max_cids = min(1.25 * opt_cids, num_possible_cpus());
[...]
> + * Switching back to task mode happens when the user count goes below the
> + * threshold which was recorded on the per CPU mode switch:
> + *
> + * pcpu_thrs = min(opt_cids - (opt_cids / 4), num_possible_cpus() / 2);
I notice that mm_update_cpus_allowed() calls __mm_update_max_cids()
before updating the pcpu_thrs threshold.
sched_mm_cid_{add,remove}_user() only invoke mm_update_max_cids(mm)
without updating pcpu_thrs first.
Are those done on purpose ?
Thanks,
Mathieu
--
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists