linux-kernel - Re: [PATCHSET v6] sched: Implement BPF extensible scheduler class

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87zfrcx81u.ffs@tglx>
Date: Sun, 23 Jun 2024 12:33:33 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Linus Torvalds <torvalds@...ux-foundation.org>
Cc: Tejun Heo <tj@...nel.org>, mingo@...hat.com, peterz@...radead.org,
 juri.lelli@...hat.com, vincent.guittot@...aro.org,
 dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
 mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com, ast@...nel.org,
 daniel@...earbox.net, andrii@...nel.org, martin.lau@...nel.org,
 joshdon@...gle.com, brho@...gle.com, pjt@...gle.com, derkling@...gle.com,
 haoluo@...gle.com, dvernet@...a.com, dschatzberg@...a.com,
 dskarlat@...cmu.edu, riel@...riel.com, changwoo@...lia.com,
 himadrics@...ia.fr, memxor@...il.com, andrea.righi@...onical.com,
 joel@...lfernandes.org, linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
 kernel-team@...a.com
Subject: Re: [PATCHSET v6] sched: Implement BPF extensible scheduler class

Linus!

On Fri, Jun 21 2024 at 09:34, Linus Torvalds wrote:
> On Fri, 21 Jun 2024 at 02:35, Thomas Gleixner <tglx@...utronix.de> wrote:
> And don't get me wrong - I'm not complaining about the RT patches. I
> think they improved things enormously in the end. They've been great.

Thanks!

> I'm just saying that they are _not_ the norm to compare against.

I'm not comparing them.

I was just pointing out that you repeatedly asked me whether the nasty
parts could stay out of tree forever, which I always found odd. But now in
the context of your out of tree lecture this question struck me even more
strange. Understandably, no?

> Anyway, what I'm saying is that you trying to equate this with the RT
> patches is absolutely laughable and intellectually dishonest.

See how communication between people fails?

I might have misinterpreted your question to keep RT out of tree and you
interpreted my answer as a comparison, which was not my intention at all.

If I want to compare another out of tree project with sched ext, then I
surely do not pick RT but DPDK. The network people rejected the DPDK
approach as they wanted to have things fixed and done in tree instead of
letting everyone create their own sand pit. It worked out as it made
people think and come up with XDP and other things which gives the
dataplane people a proper tool while having the general stuff work
nicely in the same context.

In other words, that forced people to really collaborate and sort it out
for the benefit of everyone. I might be missing something crucial, but I
fail to see the same benefit coming from sched ext.

Coming back to what you said in an earlier mail:

> And the "I detest pluggabnle schedulers" has been long superseded by
> "I detest people who complain about our one scheduler because they
> have special loads that only they care about".

I agree with that sentiment. I don't agree with the "solution".

The sad truth is that everyone involved admired the problem for a decade
and kept complaining in the one way or the other.

Google dropping out of scheduler development was not because of scheduler
people being hard to work with. Peter and Paul worked perfectly fine
together and the hierarchical cgroup scheduling muck was merged under the
premise "We work it out in tree". It just never happened because the people
who added it vanished in a black hole for reasons which have nothing to do
with the kernel scheduler community.

At last years OSPM everyone in the room, including the sched ext folks,
agreed that the main problem is that the scheduler does not have enough
information about the requirements and properties of applications, which is
not a Facebook/Google specific thing. That applies to all sorts of problems
including power, thermal and capacity constraints.

That's nothing new. The academic scheduler research identified that in the
late 90s already and came up with specific solutions to prove their
point. That effort fell short to be generalized.

So sched_ext does exactly this by putting requirements and properties of
workloads into the BPF scheduler and the related user space portion.

I completely agree that this is a nice tool for doing research to identify
what needs to be done to make this a generalized approach.

I disagree that providing it as an official workaround will result in more
collaboration and a better result for everyone in the very end. Quite the
contrary it is going to foster fragmentation way beyond the Google/Facebook
space.

The whole notion of 'my workload is so special and therefore we need
special sauce' is a strawman. We've debunked a lot of 'my thing is so
special' claims over the years by making people sit down and come up with
generalized solutions for the benefit of everyone.

I'm not saying we debunked all. Some of them failed because people refused
to work it out and opted for keeping their stuff out of tree forever. But
in the vast majority of cases it worked out pretty well.

I recently watched a talk about sched ext which explained how to model an
execution pipeline for a specific workload to optimize the scheduling of
the involved threads and how innovative that is. I really had a good laugh
because that's called explicit plan scheduling and has been described and
implemented in the early 2000s by academics already.

Innovative or not, that's not the point. The point is that none of this
resulted in the promised feed back to the scheduler proper. As this runs in
production already, it would have been a great talk at OSPM24 to follow up
on the 'requirements and properties' discussion to at least provide the
insights of this in the form of data to work from.

That's one of the reasons why I said:

> I'm still not seeing the general mainline people benefit of all this, so
> I have to trust you that there is one which is beyond my comprehension
> skills.

I can see your benefit that the detesting complaining will stop, but I fail
to map that into a general benefit for everyone else. Some enlightment
would be appreciated.

Thanks,

	tglx