lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e6KW_ypfbIVbenvwbBwGgnxX700e-A68oVmCn1pdJ0834U4wtIWXhh5zfHrQF2dvSL_Vc_heC4KZ0XRzNZ-w7QfF70W0epxCzpph55reOls=@pm.me>
Date: Mon, 16 Sep 2024 19:23:19 +0000
From: Michael Pratt <mcpratt@...me>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>, Vincent Guittot <vincent.guittot@...aro.org>, Dietmar Eggemann <dietmar.eggemann@....com>, linux-kernel@...r.kernel.org, stable@...r.kernel.org
Subject: Re: [RESEND PATCH] sched/syscalls: Allow setting niceness using sched_param struct

Hi Peter,

On Monday, September 16th, 2024 at 07:13, Peter Zijlstra <peterz@...radead.org> wrote:
> 
> On Mon, Sep 16, 2024 at 05:08:49AM +0000, Michael Pratt wrote:
> 
> > From userspace, spawning a new process with, for example,
> > posix_spawn(), only allows the user to work with
> > the scheduling priority value defined by POSIX
> > in the sched_param struct.
> > 
> > However, sched_setparam() and similar syscalls lead to
> > __sched_setscheduler() which rejects any new value
> > for the priority other than 0 for non-RT schedule classes,
> > a behavior kept since Linux 2.6 or earlier.
> 
> 
> Right, and the current behaviour is entirely in-line with the POSIX
> specs.

I'm just mentioning this for context.
In this case, "in-line with POSIX specs" has nothing to do with
whether or not the feature works. POSIX says nothing about which policies
should be accepting which values or not and how they are processed.
Like many things, it is simply implementation-specific.

The current behavior is that it doesn't work, and I would like it to work.

> I realize this might be a pain, but why should we change this spec
> conforming and very long standing behavior?

The fact that the overall behavior is "very long standing" is a coincidence.
The code here conforms to the specs both before and after the patch,
and the difference is functionality.

In fact, I am not aiming to change
the exact behavior of "reject every priority value other than 0"
but rather work around that by translating it to niceness
so long as it is a valid range passed as the priority by the user.
This method is not just to maintain that priority must be 0, but I found it necessary,
because if the syscall were allowed to change the static priority,
then a future change in the "niceness" value would theoretically allow the priority
to pass into the RT range for non-RT policies.

> Worse, you're proposing a nice ABI that is entirely different from the
> normal [-20,19] range.

Please take a closer look... The resulting niceness value is exactly that range.
  PRIO_TO_NICE([MAX_RT_PRIO,MAX_PRIO-1]) = [-20,19]

I am not writing this so that the value passed as a "priority" value should be assumed
to be the "niceness" value instead by the user, but rather that the user should
pass a value for "priority" that will actually result in that value,
but with the "niceness" adjusted instead,
as that is the user-specific method to effectively do the same thing.

The "niceness" value has no meaning in the world of POSIX, it only means something
in the world of Linux, and just like the translation from sched_param to sched_attr structs,
this is the place where we would translate priority to niceness.
Everything outside the internals of the kernel should be understood as the "actual" priority,
because POSIX is a userspace that doesn't acknowledge or understand the kernel's ABIs,
not the other way around.

Otherwise, we have a confusing conflation between the meaning of the two values,
where a value of 19 makes sense for niceness, but is obviously invalid for priority
for SCHED_NORMAL, and a negative value makes sense for niceness, but is obviously invalid
for priority in any policy.

Implementations of posix_spawn functions ask for the "priority",
and POSIX states that the value passed in with the sched_param struct should be the "priority"
and that the usage is implementation-specific, not the other way around, where
the meaning of the value would be implementation-specific, but the usage of the value
would be clearly defined instead. I'm trying to stay in-line with the semantics as well.

> Why do you feel this is the best way forward? Would not adding
> POSIX_SPAWN_SETSCHEDATTR be a more future proof mechanism?

New flags don't change the fact that the value will be rejected in the kernel,
unless I am misunderstanding what you mean...

I believe this is the simplest and the smallest possible change
that is conforming both to POSIX and the kernel's styling
in order to make posix_spawnattr_setschedparam()
work instead of _just_ being "conforming and compliant",
which like I said is a low requirement of "just reject all values".

Flags like POSIX_SPAWN_SETSCHEDATTR would be used at the library level
and we have no problems at the library level, except for Linux-only libraries
that have not implemented posix_spawnattr_setschedparam() because it currently fails.
Notably, the musl C library is an example of this, but that might change
if we finally add support for this.

It would be nice if POSIX would add a flag to specifically cater to linux,
however, that would likely require them to add the sched_attr struct definition
or replace the sched_param struct, and as we know things usually work the other way around.

Thanks for your time.

--
MCP

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ