linux-kernel - Re: [PATCH] sched_ext: allow scx_bpf_task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2ef92808-42c9-4f8b-9d58-11e5d2c89fc3@igalia.com>
Date: Wed, 17 Sep 2025 11:12:18 +0900
From: Changwoo Min <changwoo@...lia.com>
To: Tejun Heo <tj@...nel.org>
Cc: void@...ifault.com, arighi@...dia.com, kernel-dev@...lia.com,
 sched-ext@...ts.linux.dev, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] sched_ext: allow scx_bpf_task_cgroup() in ops.dispatch()

Hello Tejun,

Thanks for the feedback.

On 9/17/25 05:32, Tejun Heo wrote:
> Hello,
> 
> On Mon, Sep 15, 2025 at 03:52:36PM +0900, Changwoo Min wrote:
> ...
>> Fix this by adding the prev task to scx.kf_tasks so that task-related
>> BPF helpers such as scx_bpf_task_cgroup() can be called safely. Since
>> the SCX_CALL_OP_TASK family assumes the first argument is the task,
>> introduce a new SCX_CALL_OP_TASK_ANY macro without that restriction.
>> Also update __SCX_KF_TERMINAL to include SCX_KF_DISPATCH.
> 
> I'm not sure this is safe tho. ops.dispatch() can release the rq lock it's
> holding to migrate tasks across rq's, which means that other operations can
> nest inside - ie. there can be an irq which triggers ops.enqueue() while
> ops.dispatch() is in progress. That can in turn overwrite
> current->scx.kf_tasks[].

I thought that ops.dispatch() always holds an rq lock since there
is a lockdep_assert_rq_held(rq) check at the beginning of
balance_one(), which invokes the BPF scheduler's dispatch.
I guess I am missing an edge case?

> 
> I wonder whether a better approach would be tracking cgroup membership from
> BPF side. ops.init_task() tells you the initial cgroup it's joining and if

Currently, it is also not allowed to call scx_bpf_task_cgroup()
at ops.init_task() because the rq lock is not held at
ops.init_task(). The earliest possible moment to get a task's
cgroup ID is ops.enable().

> the task later moves to another cgroup, ops.cgroup_move() will be invoked.
> Would that work?

Checking cgroup ID at ops.enable() is not ideal because there
would be no cgroup ID changes between ops.enable() calls.
However, the additional overhead should be marginal. So tracking
cgroup ID at ops.enable() and ops.cgroup_move() will work.

Regards,
Changwoo Min