linux-kernel - Re: [PATCH 1/2] sched_ext: Track currently locked rq

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aAQVRhs_y-dxC4yE@gpd3>
Date: Sat, 19 Apr 2025 23:27:34 +0200
From: Andrea Righi <arighi@...dia.com>
To: Tejun Heo <tj@...nel.org>
Cc: David Vernet <void@...ifault.com>, Changwoo Min <changwoo@...lia.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH 1/2] sched_ext: Track currently locked rq

On Sat, Apr 19, 2025 at 10:30:37PM +0200, Andrea Righi wrote:
> On Sat, Apr 19, 2025 at 10:10:13PM +0200, Andrea Righi wrote:
> > On Sat, Apr 19, 2025 at 07:34:16AM -1000, Tejun Heo wrote:
> > > Hello, Andrea.
> > > 
> > > On Sat, Apr 19, 2025 at 02:24:30PM +0200, Andrea Righi wrote:
> > > > @@ -149,6 +149,7 @@ struct sched_ext_entity {
> > > >  	s32			selected_cpu;
> > > >  	u32			kf_mask;	/* see scx_kf_mask above */
> > > >  	struct task_struct	*kf_tasks[2];	/* see SCX_CALL_OP_TASK() */
> > > > +	struct rq		*locked_rq;	/* currently locked rq */
> > > 
> > > Can this be a percpu variable? While rq is locked, current can't switch out
> > > anyway and that way we don't have to increase the size of task. Note that
> > > kf_tasks[] are different in that some ops may, at least theoretically,
> > > sleep.
> > 
> > Yeah, I was debating between using a percpu variable or storing it in
> > current. I went with current just to stay consistent with kf_tasks.
> > 
> > But you're right about not to increasing the size of the task, and as you
> > pointed out, we can’t switch if the rq is locked, so a percpu variable
> > should work. I’ll update that in v2.
> 
> Hm... actually thinking more about this, a problem with the percpu variable
> is that, if no rq is locked, we could move to a different CPU and end up
> reading the wrong rq_locked via scx_locked_rq(). I don't think we want to
> preempt_disable/enable all the callbacks just to fix this... Maybe storing
> in current is a safer choice?

And if we don't want to increase the size of sched_ext_entity, we could
store the cpu of the currently locked rq, right before "disallow", like:

struct sched_ext_entity {
	struct scx_dispatch_q *    dsq;                  /*     0     8 */
	struct scx_dsq_list_node   dsq_list;             /*     8    24 */
	struct rb_node             dsq_priq __attribute__((__aligned__(8))); /*    32    24 */
	u32                        dsq_seq;              /*    56     4 */
	u32                        dsq_flags;            /*    60     4 */
	/* --- cacheline 1 boundary (64 bytes) --- */
	u32                        flags;                /*    64     4 */
	u32                        weight;               /*    68     4 */
	s32                        sticky_cpu;           /*    72     4 */
	s32                        holding_cpu;          /*    76     4 */
	s32                        selected_cpu;         /*    80     4 */
	u32                        kf_mask;              /*    84     4 */
	struct task_struct *       kf_tasks[2];          /*    88    16 */
	atomic_long_t              ops_state;            /*   104     8 */
	struct list_head           runnable_node;        /*   112    16 */
	/* --- cacheline 2 boundary (128 bytes) --- */
	long unsigned int          runnable_at;          /*   128     8 */
	u64                        core_sched_at;        /*   136     8 */
	u64                        ddsp_dsq_id;          /*   144     8 */
	u64                        ddsp_enq_flags;       /*   152     8 */
	u64                        slice;                /*   160     8 */
	u64                        dsq_vtime;            /*   168     8 */
	int                        locked_cpu;           /*   176     4 */
	bool                       disallow;             /*   180     1 */

	/* XXX 3 bytes hole, try to pack */

	struct cgroup *            cgrp_moving_from;     /*   184     8 */
	/* --- cacheline 3 boundary (192 bytes) --- */
	struct list_head           tasks_node;           /*   192    16 */

	/* size: 208, cachelines: 4, members: 24 */
	/* sum members: 205, holes: 1, sum holes: 3 */
	/* forced alignments: 1 */
	/* last cacheline: 16 bytes */
} __attribute__((__aligned__(8)));

(before the hole was 7 bytes)

Then use cpu_rq()/cpu_of() to resolve that to/from the corresponding rq.

-Andrea