lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240801131735.rihobmnwszsqrdxw@airbuntu>
Date: Thu, 1 Aug 2024 14:17:35 +0100
From: Qais Yousef <qyousef@...alina.io>
To: Tejun Heo <tj@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, David Vernet <void@...ifault.com>,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Alexei Starovoitov <ast@...nel.org>,
	Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [GIT PULL] sched_ext: Initial pull request for v6.11

On 07/30/24 15:22, Tejun Heo wrote:
> Hello,
> 
> On Thu, Jul 25, 2024 at 02:19:07AM +0100, Qais Yousef wrote:
> > We really shouldn't change how schedutil works. The governor is supposed to
> > behave in a certain way, and we need to ensure consistency. I think you should
> > look on how you make your scheduler compatible with it. Adding hooks to say
> > apply this perf value that I want is a recipe for randomness.
> 
> You made the same point in another thread, so let's discuss it there but

Don't you think it's a bit rushed to include this part in the pull request?

> it's not changing the relationship between schedutil and sched class.
> schedutil collects utility signals from sched classes and then translates
> that to cpufreq operations. For SCX scheds, the only way to get such util
> signals is asking the BPF scheduler. Nobody else knows. It's loading a
> completely new scheduler after all.

But you're effectively making schedutil a userspace governor. If SCX wants to
define its own util signal, wouldn't it be more appropriate to pair it with
user space governor instead? It makes more sense to pair userspace scheduler
with userspace governor than alter schedutil behavior.

> 
> > Generally I do have big concerns about sched_ext being loaded causing spurious
> > bug report as it changes the behavior of the scheduler and the kernel is not
> > trusted when sched_ext scheduler is loaded. Like out-of-tree modules, it should
> > cause the kernel to be tainted. Something I asked for few years back when
> > Gushchin sent the first proposal
> >
> > How can we trust bug and regression report when out-of-tree code was loaded
> > that intrusively changes the way the kernel behaves? This must be marked as
> > a kernel TAINT otherwise we're doomed trying to fix out of tree code.
> 
> You raised in the other thread too but I don't think taint fits the bill
> here. Taints are useful when the impact is persistent so that we can know

That's not how I read it. It supposed to be for things that alter the kernel
spec/functionality and make it not trust worthy. We already have a taint flag
for overriding ACPI tables. Out of tree modules can have lots of power to alter
things in a way that makes the kernel generally not trust worthy. Given how
intrusively the scheduler behavior can be altered with no control, I think
a taint flag to show case it is important. Not only for us, but also for app
developers as you don't know what people will decide to do that can end up
causing apps to misbehave weirdly on some systems that load specific scheduler
extensions. I think both of us (kernel and app developers) want to know that
something in the kernel that can impact this misbehavior was loaded.

> that a later failure may have been caused by an earlier thing which might
> not be around anymore. A SCX scheduler is not supposed to leave any
> persistent impact on the system. If it's loaded, we can see it's loaded in
> oops dumps and other places. If it's not, it shouldn't really be factor.
> 
> > And there's another general problem of regression reports due to failure to
> > load code due to changes to how the scheduler evolves. We need to continue to
> > be able to change our code freely without worrying about breaking out-of-tree
> > code. What is the regression rule? We don't want to be limited to be able to
> > make in-kernel changes because out-of-tree code will fail now; either to load
> > or to run as intended. How is the current code designed to handle failsafe when
> > the external scheduler is no longer compatible with existing kernel and *they*
> > need to rewrite their code, pretty much the way it goes for out-of-tree modules
> > now?
> 
> It's the same as other BPF hooks. We don't want to break willy-nilly but we
> can definitely break backward compatibility if necessary. This has been
> discussed to death and I don't think we can add much by litigating the case
> again.

Was this discussion on the list? I haven't seen it. Assuming the details were
discussed with the maintainers and Linus and there's an agreement in place,
that's good to know. If not, then a clarity before-the-fact is better than
after-the-fact. I think the boundaries are very hazy and regressions are one of
the major reasons that holds up the speed of scheduler development. It is very
easy to break some configuration/workload/system unwittingly. Adding more
constraints that are actually harder to deal with to the mix will make our life
exponentially more difficult.


Thanks

--
Qais Yousef

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ