linux-kernel - Re: [PATCH 10/11] sched_ext: Implement scx_bpf_dispatch[_vtime]_from

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZtNC6l9nUEPnneag@slm.duckdns.org>
Date: Sat, 31 Aug 2024 06:20:58 -1000
From: Tejun Heo <tj@...nel.org>
To: Andrea Righi <andrea.righi@...ux.dev>
Cc: void@...ifault.com, kernel-team@...a.com, linux-kernel@...r.kernel.org,
	Daniel Hodges <hodges.daniel.scott@...il.com>,
	Changwoo Min <multics69@...il.com>,
	Dan Schatzberg <schatzberg.dan@...il.com>
Subject: Re: [PATCH 10/11] sched_ext: Implement
 scx_bpf_dispatch[_vtime]_from_dsq()

Hello,

On Sat, Aug 31, 2024 at 04:30:57PM +0200, Andrea Righi wrote:
...
> > @@ -5511,7 +5516,7 @@ __bpf_kfunc void scx_bpf_dispatch(struct task_struct *p, u64 dsq_id, u64 slice,
> >   * scx_bpf_dispatch_vtime - Dispatch a task into the vtime priority queue of a DSQ
> >   * @p: task_struct to dispatch
> >   * @dsq_id: DSQ to dispatch to
> > - * @slice: duration @p can run for in nsecs
> > + * @slice: duration @p can run for in nsecs, 0 to keep the current value
> >   * @vtime: @p's ordering inside the vtime-sorted queue of the target DSQ
> 
> Maybe allow to keep the current vtime if 0 is passed, similar to slice?

It's tricky as 0 is a valid vtime. It's unlikely but depending on how vtime
is defined, it may wrap in a practical amount of time. More on this below.

...
> > +	/*
> > +	 * Can be called from either ops.dispatch() locking this_rq() or any
> > +	 * context where no rq lock is held. If latter, lock @p's task_rq which
> > +	 * we'll likely need anyway.
> > +	 */
> 
> About locking, I was wondering if we could provide a similar API
> (scx_bpf_dispatch_lock()?) to use scx_bpf_dispatch() from any context
> and not necessarily from ops.select_cpu() / ops.enqueue() or
> ops.dispatch().
> 
> This would be really useful for user-space schedulers, since we could
> use scx_bpf_dispatch() directly and get rid of the
> BPF_MAP_TYPE_RINGBUFFER complexity.

One difference between scx_bpf_dispatch() and scx_bpf_dispatch_from_dsq() is
that the former is designed to be safe to call from any context under any
locks by doing the actual dispatches asynchronously. This is primarily to
allow scx_bpf_dispatch() to be called under BPF locks as they are used to
transfer the ownership of tasks from the BPF side to the kernel side. This
makes it more difficult to make scx_bpf_dispatch() more flexible. The way
BPF locks are currently developing, we might not have to worry about killing
the system through deadlocks but it'd still be very prone to soft deadlocks
that kill the BPF scheduler if implemented synchronously. Maybe the solution
here is bouncing to an irq_work or something. I'll think more on it.

...
> > +__bpf_kfunc bool scx_bpf_dispatch_from_dsq(struct bpf_iter_scx_dsq *it__iter,
> > +					   struct task_struct *p, u64 dsq_id,
> > +					   u64 slice, u64 enq_flags)
> > +{
> > +	return scx_dispatch_from_dsq((struct bpf_iter_scx_dsq_kern *)it__iter,
> > +				     p, dsq_id, slice, 0, enq_flags);
> > +}
> > +
> > +/**
> > + * scx_bpf_dispatch_vtime_from_dsq - Move a task from DSQ iteration to a PRIQ DSQ
> > + * @it__iter: DSQ iterator in progress
> > + * @p: task to transfer
> > + * @dsq_id: DSQ to move @p to
> > + * @slice: duration @p can run for in nsecs, 0 to keep the current value
> > + * @vtime: @p's ordering inside the vtime-sorted queue of the target DSQ
> > + * @enq_flags: SCX_ENQ_*
> 
> Hm... can we pass 6 arguments to a kfunc? I think we're limited to 5,
> unless I'm missing something here.

Hah, I actually don't know and didn't test the vtime variant. Maybe I should
just drop the @slice and @vtime. They can be set by the caller explicitly
before calling these kfuncs anyway although there are some concerns around
ownership (ie. the caller can't be sure that the task has already been
dispatched by someone else before scx_bpf_dispatch_from_dsq() commits). Or
maybe I should pack the optional arguments into a struct. I'll think more
about it.

Thanks.

-- 
tejun