linux-kernel - Re: [PATCH 26/30] sched: handle preempt=voluntary under PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <36eef8c5-8ecd-4c90-8851-1c2ab342e2bb@paulmck-laptop>
Date: Mon, 11 Mar 2024 12:26:36 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Ankur Arora <ankur.a.arora@...cle.com>
Cc: Joel Fernandes <joel@...lfernandes.org>, linux-kernel@...r.kernel.org,
	tglx@...utronix.de, peterz@...radead.org,
	torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
	luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
	hpa@...or.com, mingo@...hat.com, juri.lelli@...hat.com,
	vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
	jpoimboe@...nel.org, mark.rutland@....com, jgross@...e.com,
	andrew.cooper3@...rix.com, bristot@...nel.org,
	mathieu.desnoyers@...icios.com, geert@...ux-m68k.org,
	glaubitz@...sik.fu-berlin.de, anton.ivanov@...bridgegreys.com,
	mattst88@...il.com, krypton@...ich-teichert.org,
	rostedt@...dmis.org, David.Laight@...lab.com, richard@....at,
	mjguzik@...il.com, jon.grimm@....com, bharata@....com,
	raghavendra.kt@....com, boris.ostrovsky@...cle.com,
	konrad.wilk@...cle.com
Subject: Re: [PATCH 26/30] sched: handle preempt=voluntary under PREEMPT_AUTO

On Sun, Mar 10, 2024 at 09:50:33PM -0700, Ankur Arora wrote:
> 
> Paul E. McKenney <paulmck@...nel.org> writes:
> 
> > On Thu, Mar 07, 2024 at 08:22:30PM -0800, Ankur Arora wrote:
> >>
> >> Paul E. McKenney <paulmck@...nel.org> writes:
> >>
> >> > On Thu, Mar 07, 2024 at 07:15:35PM -0500, Joel Fernandes wrote:
> >> >>
> >> >>
> >> >> On 3/7/2024 2:01 PM, Paul E. McKenney wrote:
> >> >> > On Wed, Mar 06, 2024 at 03:42:10PM -0500, Joel Fernandes wrote:
> >> >> >> Hi Ankur,
> >> >> >>
> >> >> >> On 3/5/2024 3:11 AM, Ankur Arora wrote:
> >> >> >>>
> >> >> >>> Joel Fernandes <joel@...lfernandes.org> writes:
> >> >> >>>
> >> >> >> [..]
> >> >> >>>> IMO, just kill 'voluntary' if PREEMPT_AUTO is enabled. There is no
> >> >> >>>> 'voluntary' business because
> >> >> >>>> 1. The behavior vs =none is to allow higher scheduling class to preempt, it
> >> >> >>>> is not about the old voluntary.
> >> >> >>>
> >> >> >>> What do you think about folding the higher scheduling class preemption logic
> >> >> >>> into preempt=none? As Juri pointed out, prioritization of at least the leftmost
> >> >> >>> deadline task needs to be done for correctness.
> >> >> >>>
> >> >> >>> (That'll get rid of the current preempt=voluntary model, at least until
> >> >> >>> there's a separate use for it.)
> >> >> >>
> >> >> >> Yes I am all in support for that. Its less confusing for the user as well, and
> >> >> >> scheduling higher priority class at the next tick for preempt=none sounds good
> >> >> >> to me. That is still an improvement for folks using SCHED_DEADLINE for whatever
> >> >> >> reason, with a vanilla CONFIG_PREEMPT_NONE=y kernel. :-P. If we want a new mode
> >> >> >> that is more aggressive, it could be added in the future.
> >> >> >
> >> >> > This would be something that happens only after removing cond_resched()
> >> >> > might_sleep() functionality from might_sleep(), correct?
> >> >>
> >> >> Firstly, Maybe I misunderstood Ankur completely. Re-reading his comments above,
> >> >> he seems to be suggesting preempting instantly for higher scheduling CLASSES
> >> >> even for preempt=none mode, without having to wait till the next
> >> >> scheduling-clock interrupt. Not sure if that makes sense to me, I was asking not
> >> >> to treat "higher class" any differently than "higher priority" for preempt=none.
> >> >>
> >> >> And if SCHED_DEADLINE has a problem with that, then it already happens so with
> >> >> CONFIG_PREEMPT_NONE=y kernels, so no need special treatment for higher class any
> >> >> more than the treatment given to higher priority within same class. Ankur/Juri?
> >> >>
> >> >> Re: cond_resched(), I did not follow you Paul, why does removing the proposed
> >> >> preempt=voluntary mode (i.e. dropping this patch) have to happen only after
> >> >> cond_resched()/might_sleep() modifications?
> >> >
> >> > Because right now, one large difference between CONFIG_PREEMPT_NONE
> >> > an CONFIG_PREEMPT_VOLUNTARY is that for the latter might_sleep() is a
> >> > preemption point, but not for the former.
> >>
> >> True. But, there is no difference between either of those with
> >> PREEMPT_AUTO=y (at least right now).
> >>
> >> For (PREEMPT_AUTO=y, PREEMPT_VOLUNTARY=y, DEBUG_ATOMIC_SLEEP=y),
> >> might_sleep() is:
> >>
> >> # define might_resched() do { } while (0)
> >> # define might_sleep() \
> >>         do { __might_sleep(__FILE__, __LINE__); might_resched(); } while (0)
> >>
> >> And, cond_resched() for (PREEMPT_AUTO=y, PREEMPT_VOLUNTARY=y,
> >> DEBUG_ATOMIC_SLEEP=y):
> >>
> >> static inline int _cond_resched(void)
> >> {
> >>         klp_sched_try_switch();
> >>         return 0;
> >> }
> >> #define cond_resched() ({                       \
> >>         __might_resched(__FILE__, __LINE__, 0); \
> >>         _cond_resched();                        \
> >> })
> >>
> >> And, no change for (PREEMPT_AUTO=y, PREEMPT_NONE=y, DEBUG_ATOMIC_SLEEP=y).
> >
> > As long as it is easy to restore the prior cond_resched() functionality
> > for testing in the meantime, I should be OK.  For example, it would
> > be great to have the commit removing the old functionality from
> > cond_resched() at the end of the series,
> 
> I would, of course, be happy to make any changes that helps testing,
> but I think I'm missing something that you are saying wrt
> cond_resched()/might_sleep().
> 
> There's no commit explicitly removing the core cond_reshed()
> functionality: PREEMPT_AUTO explicitly selects PREEMPT_BUILD and selects
> out PREEMPTION_{NONE,VOLUNTARY}_BUILD.
> (That's patch-1 "preempt: introduce CONFIG_PREEMPT_AUTO".)
> 
> For the rest it just piggybacks on the CONFIG_PREEMPT_DYNAMIC work
> and just piggybacks on (!CONFIG_PREEMPT_DYNAMIC && CONFIG_PREEMPTION):
> 
> #if !defined(CONFIG_PREEMPTION) || defined(CONFIG_PREEMPT_DYNAMIC)
> 	/* ... */
> #if defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
> 	/* ... */
> #elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
> 	/* ... */
> #else /* !CONFIG_PREEMPTION */
> 	/* ... */
> #endif /* PREEMPT_DYNAMIC && CONFIG_HAVE_PREEMPT_DYNAMIC_CALL */
> 
> #else /* CONFIG_PREEMPTION && !CONFIG_PREEMPT_DYNAMIC */
> static inline int _cond_resched(void)
> {
> 	klp_sched_try_switch();
> 	return 0;
> }
> #endif /* !CONFIG_PREEMPTION || CONFIG_PREEMPT_DYNAMIC */
> 
> Same for might_sleep() (which really amounts to might_resched()):
> 
> #ifdef CONFIG_PREEMPT_VOLUNTARY_BUILD
>        /* ... */
> #elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_CALL)
>       /* ... */
> #elif defined(CONFIG_PREEMPT_DYNAMIC) && defined(CONFIG_HAVE_PREEMPT_DYNAMIC_KEY)
>       /* ... */
> #else
> # define might_resched() do { } while (0)
> #endif /* CONFIG_PREEMPT_* */
> 
> But, I doubt that I'm telling you anything new. So, what am I missing?

It is really a choice at your end.

Suppose we enable CONFIG_PREEMPT_AUTO on our fleet, and find that there
was some small set of cond_resched() calls that provided sub-jiffy
preemption that matter to some of our workloads.  At that point, what
are our options?

1.	Revert CONFIG_PREEMPT_AUTO.

2.	Revert only the part that disables the voluntary preemption
	semantics of cond_resched().  Which, as you point out, ends up
	being the same as #1 above.

3.	Hotwire a voluntary preemption into the required locations.
	Which we would avoid doing due to upstream-acceptance concerns.

So, how easy would you like to make it for us to use as much of
CONFIG_PREEMPT_AUTO=y under various possible problem scenarios?

Yes, in a perfect world, we would have tested this already, but I
am still chasing down problems induced by simple rcutorture testing.
Cowardly of us, isn't it?  ;-)

							Thanx, Paul