linux-kernel - Re: [GIT PULL] sched_ext: Initial pull request for v6.11

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20240725011907.3f5ropfai3xoy3l3@airbuntu>
Date: Thu, 25 Jul 2024 02:19:07 +0100
From: Qais Yousef <qyousef@...alina.io>
To: Tejun Heo <tj@...nel.org>
Cc: Linus Torvalds <torvalds@...ux-foundation.org>,
	linux-kernel@...r.kernel.org, David Vernet <void@...ifault.com>,
	Ingo Molnar <mingo@...hat.com>,
	Peter Zijlstra <peterz@...radead.org>,
	Alexei Starovoitov <ast@...nel.org>,
	Vincent Guittot <vincent.guittot@...aro.org>
Subject: Re: [GIT PULL] sched_ext: Initial pull request for v6.11

On 07/15/24 12:32, Tejun Heo wrote:
> NOTE: I couldn't get git-request-pull to generate the correct diffstat. The
>       diffstat at the end is generated against the merged base and should
>       match the diff when sched_ext-for-6.11 is pulled after tip/sched/core
>       and bpf/for-next.
> 
> The following changes since commit d329605287020c3d1c3b0dadc63d8208e7251382:
> 
>   sched/fair: set_load_weight() must also call reweight_task() for SCHED_IDLE tasks (2024-07-04 15:59:52 +0200)
> 
> are available in the Git repository at:
> 
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/sched_ext.git/ tags/sched_ext-for-6.11
> 
> for you to fetch changes up to 8bb30798fd6ee79e4041a32ca85b9f70345d8671:
> 
>   sched_ext: Fixes incorrect type in bpf_scx_init() (2024-07-14 18:10:10 -1000)
> 
> ----------------------------------------------------------------
> sched_ext: Initial pull request for v6.11
> 
> This is the initial pull request of sched_ext. The v7 patchset
> (https://lkml.kernel.org/r/20240618212056.2833381-1-tj@kernel.org) is
> applied on top of tip/sched/core + bpf/for-next as of Jun 18th.
> 
>   tip/sched/core 793a62823d1c ("sched/core: Drop spinlocks on contention iff kernel is preemptible")
>   bpf/for-next   f6afdaf72af7 ("Merge branch 'bpf-support-resilient-split-btf'")
> 
> Since then, the following changes:
> 
> - cpuperf support which was a part of the v6 patchset was posted separately
>   and then applied after reviews.

I just reviewed this and I think you're going in the wrong direction here.
I don't think the current level of review was sufficient and we're rushing
things to get them into 6.11.

	https://lore.kernel.org/lkml/20240724234527.6m43t36puktdwn2g@airbuntu/

We really shouldn't change how schedutil works. The governor is supposed to
behave in a certain way, and we need to ensure consistency. I think you should
look on how you make your scheduler compatible with it. Adding hooks to say
apply this perf value that I want is a recipe for randomness.

Generally I do have big concerns about sched_ext being loaded causing spurious
bug report as it changes the behavior of the scheduler and the kernel is not
trusted when sched_ext scheduler is loaded. Like out-of-tree modules, it should
cause the kernel to be tainted. Something I asked for few years back when
Gushchin sent the first proposal

	https://lwn.net/Articles/873244/
	https://lore.kernel.org/lkml/20211011163852.s4pq45rs2j3qhdwl@e107158-lin.cambridge.arm.com/

How can we trust bug and regression report when out-of-tree code was loaded
that intrusively changes the way the kernel behaves? This must be marked as
a kernel TAINT otherwise we're doomed trying to fix out of tree code.

And there's another general problem of regression reports due to failure to
load code due to changes to how the scheduler evolves. We need to continue to
be able to change our code freely without worrying about breaking out-of-tree
code. What is the regression rule? We don't want to be limited to be able to
make in-kernel changes because out-of-tree code will fail now; either to load
or to run as intended. How is the current code designed to handle failsafe when
the external scheduler is no longer compatible with existing kernel and *they*
need to rewrite their code, pretty much the way it goes for out-of-tree modules
now?

Thanks

--
Qais Yousef