[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <878qywyt1c.ffs@tglx>
Date: Sun, 23 Jun 2024 10:14:55 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Chris Mason <clm@...a.com>, Tejun Heo <tj@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>, mingo@...hat.com,
peterz@...radead.org, juri.lelli@...hat.com, vincent.guittot@...aro.org,
dietmar.eggemann@....com, rostedt@...dmis.org, bsegall@...gle.com,
mgorman@...e.de, bristot@...hat.com, vschneid@...hat.com, ast@...nel.org,
daniel@...earbox.net, andrii@...nel.org, martin.lau@...nel.org,
joshdon@...gle.com, brho@...gle.com, pjt@...gle.com, derkling@...gle.com,
haoluo@...gle.com, dvernet@...a.com, dschatzberg@...a.com,
dskarlat@...cmu.edu, riel@...riel.com, changwoo@...lia.com,
himadrics@...ia.fr, memxor@...il.com, andrea.righi@...onical.com,
joel@...lfernandes.org, linux-kernel@...r.kernel.org, bpf@...r.kernel.org,
kernel-team@...a.com
Subject: Re: [PATCHSET v6] sched: Implement BPF extensible scheduler class
Chris!
On Fri, Jun 21 2024 at 17:14, Chris Mason wrote:
> On 6/21/24 6:46 AM, Thomas Gleixner wrote:
> I'll be honest, the only clear and consistent communication we've gotten
> about sched_ext was "no, please go away". You certainly did engage with
> face to face discussions, but at the end of the day/week/month the
> overall message didn't change.
The only time _I_ really told you "go away" was at OSPM 2023 when you
approached everyone in the worst possible way. I surely did not even say
"please" back then.
The message people (not only me) perceived was:
"The scheduler sucks, sched_ext solves the problems, saves us millions
and Google is happy to work with us [after dropping upstream scheduler
development a decade ago and leaving the opens for others to mop up]."
followed by:
"You should take it, as it will bring in fresh people to work on the
scheduler due to the lower entry barrier [because kernel hacking sucks].
This will result in great new ideas which will be contributed back to
the scheduler proper."
That was a really brilliant marketing stunt and I told you so very bluntly.
It was presumably not your intention, but that's the problem of
communication between people. Though I haven't seen an useful attempt to
cure that.
After that clash, the room got into a lively technical discussion about the
real underlying problem, i.e. that a big part of scheduling issues comes
from the fact, that there is not enough information about the requirements
and properties of an application available. Even you agreed with that, if I
remember correctly.
sched_ext does not solve that problem. It just works around it by putting
the requirements and properties of an application into the BPF scheduler
and the user space portion of it. That works well in a controlled
environment like yours, but it does not even remotely help to solve the
underlying general problems. You acknowlegded that and told: But we don't
have it today, though sched_ext is ready and will help with that.
The concern that sched_ext will reduce the incentive to work on the
scheduler proper is not completely unfounded and I've yet to see the
slightest evidence which proves the contrary.
Don't tell me that this is impossible because sched_ext is not yet
upstream. It's used in production successfully as you said, so there
clearly must be something to learn from which could be shared at least in
form of data. OSPM24 would have been a great place for that especially as
the requirements and properties discussion was continued there with a plan.
At all other occasions, I sat down with people and discussed at a technical
level, but also clearly asked to resolve the social rift which all of this
created.
I thereby surely said several times: "I wish it would just go away and stay
out of tree", but that's a very different message, no?
Quite some of the questions and concerns I voiced, which got also voiced by
others on the list, have not been sorted out until today. Just to name a
few from the top of my head:
- How is this supposed to work with different applications requiring
different sched_ext schedulers?
- How are distros/users supposed to handle this especially when
applications start to come with their own optimized schedulers?
- What's the documented rule for dealing with bugs and regressions on a
system where sched_ext is active?
"We'll work it out in tree" is not an answer to that. Ignoring it and let
the rest of the world deal with the fallout is not a really good answer
either.
I'm not saying that this is all your and the sched_ext peoples fault, the
other side was not always constructive either. Neither did it help that I
had to drop the ball.
For me, Linus telling that he will merge it no matter what, was a wakeup
call to all involved parties. One side reached out with a clear message to
sort this out amicably and not making the situation worse.
> At any rate, I think sched_ext has a good path forward, and I know we'll
> keep working together however we can.
Carefully avoiding the perception trap, may I politely ask what this is
supposed to tell me?
Thanks,
tglx
Powered by blists - more mailing lists