[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c2e4fed9-b207-4d28-93f5-b09f0fe78e35@efficios.com>
Date: Thu, 30 Oct 2025 11:51:03 -0400
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
 Gabriele Monaco <gmonaco@...hat.com>, Michael Jeanson
 <mjeanson@...icios.com>, Jens Axboe <axboe@...nel.dk>,
 "Paul E. McKenney" <paulmck@...nel.org>,
 "Gautham R. Shenoy" <gautham.shenoy@....com>,
 Florian Weimer <fweimer@...hat.com>, Tim Chen <tim.c.chen@...el.com>,
 Yury Norov <yury.norov@...il.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>
Subject: Re: [patch V3 17/20] sched/mmcid: Provide CID ownership mode fixup
 functions
On 2025-10-29 09:09, Thomas Gleixner wrote:
> 
> At the point of switching to per CPU mode the new user is not yet visible
> in the system, so the task which initiated the fork() runs the fixup
> function: mm_cid_fixup_tasks_to_cpu() walks the thread list and either
> transfers each tasks owned CID to the CPU the task runs on or drops it into
> the CID pool if a task is not on a CPU at that point in time. Tasks which
> schedule in before the task walk reaches them do the handover in
> mm_cid_schedin(). When mm_cid_fixup_tasks_to_cpus() completes it's
> guaranteed that no task related to that MM owns a CID anymore.
> 
> Switching back to task mode happens when the user count goes below the
> threshold which was recorded on the per CPU mode switch:
> 
> 	pcpu_thrs = min(opt_cids - (opt_cids / 4), nr_cpu_ids / 2);
> 
AFAIU this provides an hysteresis so we don't switch back and
forth between modes if a single thread is forked/exits repeatedly,
right ?
> did not cover yet do the handover themself.
themselves
> 
> This transition from CPU to per task ownership happens in two phases:
> 
>   1) mm:mm_cid.transit contains MM_CID_TRANSIT. This is OR'ed on the task
>      CID and denotes that the CID is only temporarily owned by the
>      task. When it schedules out the task drops the CID back into the
>      pool if this bit is set.
OK, so the mm_drop_cid() on sched out only happens due to a transition
from per-cpu back to per-task. This answers my question in the previous
patch.
> 
>   2) The initiating context walks the per CPU space and after completion
>      clears mm:mm_cid.transit. After that point the CIDs are strictly
>      task owned again.
> 
> This two phase transition is required to prevent CID space exhaustion
> during the transition as a direct transfer of ownership would fail if
> two tasks are scheduled in on the same CPU before the fixup freed per
> CPU CIDs.
Clever. :-)
> + * Switching to per CPU mode happens when the user count becomes greater
> + * than the maximum number of CIDs, which is calculated by:
> + *
> + *	opt_cids = min(mm_cid::nr_cpus_allowed, mm_cid::users);
> + *	max_cids = min(1.25 * opt_cids, num_possible_cpus());
[...]
> + * Switching back to task mode happens when the user count goes below the
> + * threshold which was recorded on the per CPU mode switch:
> + *
> + *	pcpu_thrs = min(opt_cids - (opt_cids / 4), num_possible_cpus() / 2);
I notice that mm_update_cpus_allowed() calls __mm_update_max_cids() 
before updating the pcpu_thrs threshold.
sched_mm_cid_{add,remove}_user() only invoke mm_update_max_cids(mm)
without updating pcpu_thrs first.
Are those done on purpose ?
Thanks,
Mathieu
-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com
Powered by blists - more mailing lists
 
