linux-kernel - Re: [patch V4 15/20] sched/mmcid: Introduce per task/CPU ownership infrastructure

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <6fbc0d79-f2b3-447d-a173-22bb11a30561@efficios.com>
Date: Mon, 17 Nov 2025 14:05:16 -0500
From: Mathieu Desnoyers <mathieu.desnoyers@...icios.com>
To: Thomas Gleixner <tglx@...utronix.de>, LKML <linux-kernel@...r.kernel.org>
Cc: Peter Zijlstra <peterz@...radead.org>,
 Gabriele Monaco <gmonaco@...hat.com>, Michael Jeanson
 <mjeanson@...icios.com>, Jens Axboe <axboe@...nel.dk>,
 "Paul E. McKenney" <paulmck@...nel.org>,
 "Gautham R. Shenoy" <gautham.shenoy@....com>,
 Florian Weimer <fweimer@...hat.com>, Tim Chen <tim.c.chen@...el.com>,
 Yury Norov <yury.norov@...il.com>, Shrikanth Hegde <sshegde@...ux.ibm.com>
Subject: Re: [patch V4 15/20] sched/mmcid: Introduce per task/CPU ownership
 infrastructure

On 2025-11-16 15:49, Thomas Gleixner wrote:
[...]

I'm OK with the proposed change, but I'd like to clarify two
points in this commit message in case I'm misunderstanding
something.

> 
> The current upstream implementation tries to keep the CID with the task
> even in overcommit situations, which complicates task migration.

[...]

For the sake of this discussion, I will assume that your explanation here
is about the upstream implementation before the "sched/mmcid: Revert the
complex CID management".

I don't agree with your statement above.

In the upstream implementation, we have the two following cases in
overcommit scenario:

__sched_mm_cid_migrate_from_fetch_cid():

         /*
          * If the migrated task has no last cid, or if the current
          * task on src rq uses the cid, it means the source cid does not need
          * to be moved to the destination cpu.
          */
[...]
         /*
          * If we observe an active task using the mm on this rq, it means we
          * are not the last task to be migrated from this cpu for this mm, so
          * there is no need to move src_cid to the destination cpu.
          */

The above prevents mm_cid movement for a source CPU which is currently running
tasks that use the same mm.

sched_mm_cid_migrate_to():

         /*
          * Move the src cid if the dst cid is unset. This keeps id
          * allocation closest to 0 in cases where few threads migrate around
          * many CPUs.
          *
          * If destination cid or recent cid is already set, we may have
          * to just clear the src cid to ensure compactness in frequent
          * migrations scenarios.
          *
          * It is not useful to clear the src cid when the number of threads is
          * greater or equal to the number of allowed CPUs, because user-space
          * can expect that the number of allowed cids can reach the number of
          * allowed CPUs.
          */
[...]
         if (dst_cid_is_set && atomic_read(&mm->mm_users) >= READ_ONCE(mm->nr_cpus_allowed))
                 return;

The above is really what makes sure that we favor keeping the mm_cid
currently allocated on the destination CPU rather than bring over
the mm_cid from the source CPU.

> This can be done differently by implementing a strict CID ownership
> mechanism. Either the CIDs are owned by the tasks or by the CPUs. The
> latter provides less locality when tasks are heavily migrating, but there
> is no justification to optimize for overcommit scenarios and thereby
> penalizing everyone else.
AFAIU, the new 2 modes scheme (task vs cpu) makes similar tradeoffs as the
upstream implementation, which is to *not* move the mm_cid around in
overcommit scenarios, leaving them on their source CPUs.

The only two cases where the upstream implementation can be more aggressively
moving mm_cid around (when nr_tasks >= nr_allowed_cpus) is when the load
balancer does a poor job at load balancing:

* When migrating the last task for an mm out of a given CPU.
* When the destination CPU does not currently run any tasks for that mm.

And I don't think it makes sense to optimize mm_cid compactness for oddly
balanced workloads, so I think your new approach makes sense.

If you agree with my analysis, we can simply reword the commit message to
not imply that there is any expected gain in moving mm_cid with migration
in overcommit scenarios, both with the upstream and new implementations.

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
https://www.efficios.com