linux-kernel - Re: [PATCH] sched/ext: Add cpumask to skip unsuitable dispatch queues

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [day] [month] [year] [list]

Message-ID: <20260204094202.3917675-1-realwujing@gmail.com>
Date: Wed,  4 Feb 2026 04:41:58 -0500
From: Qiliang Yuan <realwujing@...il.com>
To: arighi@...dia.com
Cc: bsegall@...gle.com,
	changwoo@...lia.com,
	david.dai@...ux.dev,
	dietmar.eggemann@....com,
	emil@...alapatis.com,
	jake@...lion.co.uk,
	juri.lelli@...hat.com,
	linux-kernel@...r.kernel.org,
	mgorman@...e.de,
	mingo@...hat.com,
	newton@...a.com,
	peterz@...radead.org,
	realwujing@...il.com,
	rostedt@...dmis.org,
	schatzberg.dan@...il.com,
	sched-ext@...ts.linux.dev,
	suzhidao@...omi.com,
	tj@...nel.org,
	vincent.guittot@...aro.org,
	void@...ifault.com,
	vschneid@...hat.com,
	yuanql9@...natelecom.cn
Subject: Re: [PATCH] sched/ext: Add cpumask to skip unsuitable dispatch queues

Hi Andrea, I have fixed those issues in v2:

https://lore.kernel.org/all/20260204093435.3915393-1-realwujing@gmail.com/

On Tue, Feb 03, 2026 at 09:37:14AM +0100, Andrea Righi wrote:
> Did you run some benchmarks / have some numbers?

I'm working on collecting more detailed benchmark numbers. However, theoretically,
the bitwise cpumask_or() should be much cheaper than a DSQ scan that results in
multiple cache misses during task structure dereferencing, even for small queues.

> It's true that we save the O(N) scan when the DSQ has no eligible tasks, but we're
> adding cost on every enqueue: cpumask_or() on potentially large cpumasks can be
> expensive.
> ... for small queues or mixed workloads, the cpumask overhead probably exceeds
> the savings...

To minimize the enqueue overhead, I've optimized the dispatch_enqueue() path in v2:
- Use cpumask_copy() instead of cpumask_or() when the task is the first one in the DSQ.
- Skip the cpumask_or() update if the DSQ's cpus_allowed mask is already full.

> The cpumask is only updated during enqueue and cleared when the queue empties. If a
> task's affinity changes while it's already in the queue (i.e., sched_setaffinity()),
> the cpus_allowed mask becomes stale.

Fixed in v2. I've added a hook in set_cpus_allowed_scx() to update the DSQ's
cpus_allowed mask whenever a task's affinity changes while it is enqueued in a DSQ.

> I don't see the corresponding kfree() in the cleanup path.

Fixed in v2. I've added an RCU callback (free_dsq_rcu_callback) to explicitly free
dsq->cpus_allowed before freeing the DSQ structure itself.

Also, I've restricted the cpumask allocation to user-defined DSQs only, as built-in
DSQs (local, global, bypass) don't need this optimization.

Thanks,
Qiliang