linux-kernel - Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Y5haDh3sYUFcXkBx@hirez.programming.kicks-ass.net>
Date:   Tue, 13 Dec 2022 11:55:10 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Tejun Heo <tj@...nel.org>
Cc:     torvalds@...ux-foundation.org, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
        martin.lau@...nel.org, joshdon@...gle.com, brho@...gle.com,
        pjt@...gle.com, derkling@...gle.com, haoluo@...gle.com,
        dvernet@...a.com, dschatzberg@...a.com, dskarlat@...cmu.edu,
        riel@...riel.com, linux-kernel@...r.kernel.org,
        bpf@...r.kernel.org, kernel-team@...a.com
Subject: Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class

On Mon, Dec 12, 2022 at 11:33:12AM -1000, Tejun Heo wrote:

> > But this.. afaict that means that:
> > 
> >  - the whole EXT thing is incompatible with SCHED_CORE
> 
> Can you expand on why this would be? I didn't test against SCHED_CORE, so am
> sure things might be broken but can't think of a reason why it'd be
> fundamentally incompatible.

For starters, SCHED_CORE doesn't use __pick_next_task() (much). But I
think you're going to have more trouble with prio_less() (which is the
3rd implementation of the scheduling function :/)

> >  - the whole EXT thing can be trivially starved by the presence of a
> >    single CFS/BATCH/IDLE task.
> 
> It's a simliar situation w/ RT vs. CFS, which is resolved via RT having
> starvation avoidance.

That is a horrible situation as is, FIFO/RR are very crap scheduling
policies for a number of reasons but we're stuck with them due to
history and POSIX :-(, that is not something you should justify anything
with.

In fact, it should be an example of what to avoid.

Specifically, FIFO/RR fail at the fundamentals of OS
abstractions -- they provide neither resource distribution nor
isolation.

> Here, the way it's handled is a bit different, SCX has
> a watchdog mechanism implemented in "[PATCH 18/31] sched_ext: Implement
> runnable task stall watchdog", so if SCX tasks hang for whatever reason
> including being starved by CFS, it will get aborted and all tasks will be
> handed back to CFS. IOW, it's treated like any other BPF scheduler errors
> that can lead to stalls and recovered the same way.

That all sounds quite terrible.. :/

When the scheduler isn't available it should be an error to switch a
task to the policy, when there are tasks in the policy, it must not go
away.

The policy itself should never cause policy changes.