linux-kernel - Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABk29Nuyz4oT_pE9tFh8+q+ygPhnYBH3MMx7xgYOUcFUQ0qk3w@mail.gmail.com>
Date:   Tue, 13 Dec 2022 15:20:57 -0800
From:   Josh Don <joshdon@...gle.com>
To:     Rik van Riel <riel@...riel.com>
Cc:     Tejun Heo <tj@...nel.org>, Peter Zijlstra <peterz@...radead.org>,
        torvalds@...ux-foundation.org, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com,
        ast@...nel.org, daniel@...earbox.net, andrii@...nel.org,
        martin.lau@...nel.org, brho@...gle.com, pjt@...gle.com,
        derkling@...gle.com, haoluo@...gle.com, dvernet@...a.com,
        dschatzberg@...a.com, dskarlat@...cmu.edu,
        linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
        kernel-team@...a.com
Subject: Re: [PATCH 14/31] sched_ext: Implement BPF extensible scheduler class

On Tue, Dec 13, 2022 at 10:40 AM Rik van Riel <riel@...riel.com> wrote:
>
> On Tue, 2022-12-13 at 08:12 -1000, Tejun Heo wrote:
> > Hello,
> >
> > On Tue, Dec 13, 2022 at 11:55:10AM +0100, Peter Zijlstra wrote:
> > > On Mon, Dec 12, 2022 at 11:33:12AM -1000, Tejun Heo wrote:
> > >
> > > > Here, the way it's handled is a bit different, SCX has
> > > > a watchdog mechanism implemented in "[PATCH 18/31] sched_ext:
> > > > Implement
> > > > runnable task stall watchdog", so if SCX tasks hang for whatever
> > > > reason
> > > > including being starved by CFS, it will get aborted and all tasks
> > > > will be
> > > > handed back to CFS. IOW, it's treated like any other BPF
> > > > scheduler errors
> > > > that can lead to stalls and recovered the same way.
> > >
> > > That all sounds quite terrible.. :/
> >
> > The main source of difference is that we can't implicitly trust the
> > BPF
> > scheduler and if it malfunctions or on user request, the system
> > should
> > always be recoverable, so there are some extra things which are
> > inherently
> > necessary to support that.
> >
> That makes me wonder whether loading an SCX policy
> should just have that policy take over all of the
> SCHED_OTHER tasks by default, and have a failure of
> the policy just return those tasks to CFS?
>
> Having the two be operative at the same time seems
> to be a cause of hard to resolve issues, while simply
> running all non-RT tasks under the loadable policy
> could simplify both internal kernel interfaces, as
> well as externally visible effects?

There are reasons to want to still have CFS available even when SCX is
loaded. For example, on a partitioned shared tenant machine, moving
one application to an SCX policy without needing to move everyone. Or,
wanting to avoid scheduling things like kthreads under an SCX policy,
since for example that makes an SCX policy writer need not only
consider the needs of application threads, but also those of kernel
threads.