linux-kernel - Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y5haiM8qx1eOLjIr@hirez.programming.kicks-ass.net>
Date:   Tue, 13 Dec 2022 11:57:12 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     torvalds@...ux-foundation.org, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
        martin.lau@...nel.org, joshdon@...gle.com, brho@...gle.com,
        pjt@...gle.com, derkling@...gle.com, haoluo@...gle.com,
        dvernet@...a.com, dschatzberg@...a.com, dskarlat@...cmu.edu,
        riel@...riel.com, linux-kernel@...r.kernel.org,
        bpf@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class

On Mon, Dec 12, 2022 at 11:33:12AM -1000, Tejun Heo wrote:

> > > @@ -5800,10 +5812,13 @@ static void put_prev_task_balance(struct rq *rq, struct task_struct *prev,
> > >  	 * We can terminate the balance pass as soon as we know there is
> > >  	 * a runnable task of @class priority or higher.
> > >  	 */
> > > -	for_class_range(class, prev->sched_class, &idle_sched_class) {
> > > +	for_balance_class_range(class, prev->sched_class, &idle_sched_class) {
> > >  		if (class->balance(rq, prev, rf))
> > >  			break;
> > >  	}
> > > +#else
> > > +	/* SCX needs the balance call even in UP, call it explicitly */
> > 
> > This, *WHY* !?!
> 
> This comes from the fact that there are no strict rq boundaries and the BPF
> scheduler can share scheduling queues across any subset of CPUs however they
> see fit. So, task <-> rq association is flexible until the very last moment
> the task gets picked for execution. For the dispatch path to support this,
> it needs to be able to migrate tasks across rq's which requires unlocking
> the current rq which can only be done in ->balance(). So, sched_ext uses
> ->balance() to run the dispatch path and thus needs it called even on UP.

Fundamentally none of that makes sense, on UP there is no placement,
there is only the one CPU, very little choice to be had.

> Given that UP doesn't need to transfer tasks across, it might be possible to
> move the whole dispatch operation into ->pick_next_task() but the current
> state would be different, so it's more complicated and will likely be more
> brittle.

That sounds like something is amiss, you fundamentally hold all the
right locks, there is only one.