linux-kernel - Re: [PATCHSET v4] sched: Implement BPF extensible scheduler class

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20230926092020.3alsvg6vwnc4g3td@suse.de>
Date:   Tue, 26 Sep 2023 10:20:20 +0100
From:   Mel Gorman <mgorman@...e.de>
To:     Tejun Heo <tj@...nel.org>
Cc:     Peter Zijlstra <peterz@...radead.org>,
        torvalds@...ux-foundation.org, mingo@...hat.com,
        juri.lelli@...hat.com, vincent.guittot@...aro.org,
        dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
        bristot@...hat.com, vschneid@...hat.com, ast@...nel.org,
        daniel@...earbox.net, andrii@...nel.org, martin.lau@...nel.org,
        joshdon@...gle.com, brho@...gle.com, pjt@...gle.com,
        derkling@...gle.com, haoluo@...gle.com, dvernet@...a.com,
        dschatzberg@...a.com, dskarlat@...cmu.edu, riel@...riel.com,
        linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
        kernel-team@...a.com, Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCHSET v4] sched: Implement BPF extensible scheduler class

On Tue, Sep 19, 2023 at 07:56:01AM -1000, Tejun Heo wrote:
> Hello, Mel.
> 
> I don't think the discussion has reached a point where the points of
> disagreements are sufficiently laid out from both sides. Do you have any
> further thoughts?
> 

Plenty, but I'm not sure how to reconcile this. I view pluggable scheduler
as something that would be a future maintenance nightmare and our "lived
experience" or "exposure bias" with respect to the expertise of users differs
drastically. Some developers will be mostly dealing with users that have
extensive relevant expertise, a strong incentive to maximise performance
and full control of their stack, others do not and the time cost of
supporting such users is high. While I can see advantages to having specific
schedulers targeting either a specific workload or hardware configuration,
the proliferation of such schedulers and the inevitable need to avoid
introducing any new regressions in deployed schedulers will be cumbersome.

I generally worry that certain things may not have existed in the shipped
scheduler if plugging was an option including EAS, throttling control,
schedutil integration, big.Little, adapting to chiplets and picking preferred
SMT siblings for turbo boost. In each case, integrating support was time
consuming painful and a pluggable scheduler would have been a relatively
easy out that would ultimately cost us if it was never properly integrated.
While no one wants the pain, a few of us also want to avoid the problem
of vendors publishing a hacky scheduler for their specific hardware and
discontinuing the work at that point.

I see that some friction with the current state is due to tuning knobs
moving to debugfs. FWIW, I didn't 100% agree with that move either and
carried an out-of-tree revert that displayed warnings for a time but I
understood the logic behind it. However, if the tuning parameters are
insufficient, and there is good reason to change them then the answer
is to add tuning knobs with defined semantics and document them -- not
pluggable schedulers. We've seen something along those lines recently
with nice_latency even if it turned into EEVDF instead of a new interface,
so I guess we'll see how that pans out.

I get most of your points. Maybe most users will not care about a pluggable
scheduler but *some will* and they will the maintenance burden. I get your
point as well that if there is a bug and the pluggable scheduler then the
first step would be "reproduce without the pluggable scheduler" and again,
you'd be right, that is a great first step *except* sometimes they can't or
sometimes they simply won't without significant proof and that's incurs a
maintenance burden. Even if the pluggable schedulers are GPL, there still
is a burden to understood any scheduler that is loaded to see if it's the
source of a problem which means. Instead of understanding a defined number
of schedulers that are developed over time with the history in changelogs,
we may have to understand N schedulers that may be popular and that also
is painful. That's leaving aside the difficulty of what happens when
more than 1 can be loaded and interacting once containers are involved
assuming that such support would exist in the future. It's already known
that interacting IO schedulers are a nightmare so presumably interacting
CPU schedulers within the same host would also be zero fun.

Pluggable schedulers are effectively a change that we cannot walk back
from if it turns out to be a bad idea because it potentially comes under
the "you cannot break userspace" rule if a particular pluggable scheduler
becomes popular. As I strongly believe it will be a nightmare to support
within distributions where there is almost no control over the software
stack of managing user expectations, I'm opposed to crossing that line with
pluggable schedulers. While my nightmare scenarios may never be realised
and could be overblown, it'll be hard to convince me it'll not kick me in
the face eventually.

-- 
Mel Gorman
SUSE Labs