linux-kernel - Re: [GIT PULL] sched_ext: Initial pull request for v6.11

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <Zqu5fgU73-tDMk1d@slm.duckdns.org>
Date: Thu, 1 Aug 2024 06:36:14 -1000
From: Tejun Heo <tj@...nel.org>
To: Qais Yousef <qyousef@...alina.io>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, David Vernet <void@...ifault.com>,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Alexei Starovoitov <ast@...nel.org>,
	Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [GIT PULL] sched_ext: Initial pull request for v6.11

Hello,

On Thu, Aug 01, 2024 at 02:17:35PM +0100, Qais Yousef wrote:
> > You made the same point in another thread, so let's discuss it there but
> 
> Don't you think it's a bit rushed to include this part in the pull request?

Not really. It seems pretty straightforward to me.

> > it's not changing the relationship between schedutil and sched class.
> > schedutil collects utility signals from sched classes and then translates
> > that to cpufreq operations. For SCX scheds, the only way to get such util
> > signals is asking the BPF scheduler. Nobody else knows. It's loading a
> > completely new scheduler after all.
> 
> But you're effectively making schedutil a userspace governor. If SCX wants to
> define its own util signal, wouldn't it be more appropriate to pair it with
> user space governor instead? It makes more sense to pair userspace scheduler
> with userspace governor than alter schedutil behavior.

The *scheduler* itself is defined from userspace. I have a hard time
following why utilization signal coming from that scheduler is all that
surprising. If user or the scheduler implementation want to pair it up with
userspace governor, they can do that. I don't want to make that decision for
developers who are implementing their own schedulers.

...
> That's not how I read it. It supposed to be for things that alter the kernel
> spec/functionality and make it not trust worthy. We already have a taint flag
> for overriding ACPI tables. Out of tree modules can have lots of power to alter
> things in a way that makes the kernel generally not trust worthy. Given how
> intrusively the scheduler behavior can be altered with no control, I think
> a taint flag to show case it is important. Not only for us, but also for app
> developers as you don't know what people will decide to do that can end up
> causing apps to misbehave weirdly on some systems that load specific scheduler
> extensions. I think both of us (kernel and app developers) want to know that
> something in the kernel that can impact this misbehavior was loaded.

We of course want to make sure that developers and users can tell what
they're running on. However, this doesn't really align with why taint flags
were added and how they are usually used, and it's unclear how the use of a
taint flag would improve the situation on top of the existing visibility
mechanisms (in the sysfs and oops messasges). Does this mean loading any BPF
program should taint the kernel? How about changing sysctls?

> > It's the same as other BPF hooks. We don't want to break willy-nilly but we
> > can definitely break backward compatibility if necessary. This has been
> > discussed to death and I don't think we can add much by litigating the case
> > again.
> 
> Was this discussion on the list? I haven't seen it. Assuming the details were
> discussed with the maintainers and Linus and there's an agreement in place,
> that's good to know. If not, then a clarity before-the-fact is better than
> after-the-fact. I think the boundaries are very hazy and regressions are one of
> the major reasons that holds up the speed of scheduler development. It is very
> easy to break some configuration/workload/system unwittingly. Adding more
> constraints that are actually harder to deal with to the mix will make our life
> exponentially more difficult.

I wasn't a first party in the discussions and don't have good pointers.
However, I know that the subject has been discussed to the moon and back a
few times and the conclusion is pretty clear at this point - after all, the
multiple ecosystems around BPF have been operating this way for quite a
while now. Maybe BPF folks have better pointers?

Thanks.

-- 
tejun