linux-kernel - Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n, PREEMPT

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <e762474b-a3fa-46bd-9816-7663fbba7271@paulmck-laptop>
Date: Mon, 11 Mar 2024 12:53:47 -0700
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Joel Fernandes <joel@...lfernandes.org>,
	Ankur Arora <ankur.a.arora@...cle.com>,
	linux-kernel@...r.kernel.org, peterz@...radead.org,
	torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
	luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
	hpa@...or.com, mingo@...hat.com, juri.lelli@...hat.com,
	vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
	jpoimboe@...nel.org, mark.rutland@....com, jgross@...e.com,
	andrew.cooper3@...rix.com, bristot@...nel.org,
	mathieu.desnoyers@...icios.com, geert@...ux-m68k.org,
	glaubitz@...sik.fu-berlin.de, anton.ivanov@...bridgegreys.com,
	mattst88@...il.com, krypton@...ich-teichert.org,
	rostedt@...dmis.org, David.Laight@...lab.com, richard@....at,
	mjguzik@...il.com, jon.grimm@....com, bharata@....com,
	raghavendra.kt@....com, boris.ostrovsky@...cle.com,
	konrad.wilk@...cle.com, rcu@...r.kernel.org
Subject: Re: [PATCH 15/30] rcu: handle quiescent states for PREEMPT_RCU=n,
 PREEMPT_COUNT=y

On Mon, Mar 11, 2024 at 08:12:58PM +0100, Thomas Gleixner wrote:
> On Mon, Mar 11 2024 at 11:25, Joel Fernandes wrote:
> > On 3/11/2024 1:18 AM, Ankur Arora wrote:
> >>> Yes, I mentioned this 'disabling preemption' aspect in my last email. My point
> >>> being, unlike CONFIG_PREEMPT_NONE, CONFIG_PREEMPT_AUTO allows for kernel
> >>> preemption in preempt=none. So the "Don't preempt the kernel" behavior has
> >>> changed. That is, preempt=none under CONFIG_PREEMPT_AUTO is different from
> >>> CONFIG_PREEMPT_NONE=y already. Here we *are* preempting. And RCU is getting on
> >> 
> >> I think that's a view from too close to the implementation. Someone
> >> using the kernel is not necessarily concered with whether tasks are
> >> preempted or not. They are concerned with throughput and latency.
> >
> > No, we are not only talking about that (throughput/latency). We are also talking
> > about the issue related to RCU reader-preemption causing OOM (well and that
> > could hurt both throughput and latency as well).
> 
> That happens only when PREEMPT_RCU=y. For PREEMPT_RCU=n the read side
> critical sections still have preemption disabled.
> 
> > With CONFIG_PREEMPT_AUTO=y, you now preempt in the preempt=none mode. Something
> > very different from the classical CONFIG_PREEMPT_NONE=y.
> 
> In PREEMPT_RCU=y and preempt=none mode this happens only when really
> required, i.e. when the task does not schedule out or returns to user
> space on time, or when a higher scheduling class task gets runnable. For
> the latter the jury is still out whether this should be done or just
> lazily defered like the SCHED_OTHER preemption requests.
> 
> In any case for that to matter this forced preemption would need to
> preempt a RCU read side critical section and then keep the preempted
> task away from the CPU for a long time.
> 
> That's very different from the unconditional kernel preemption model which
> preempt=full provides and only marginally different from the existing
> PREEMPT_NONE model. I know there might be dragons, but I'm not convinced
> yet that this is an actual problem.
> 
> OTOH, doesn't PREEMPT_RCU=y have mechanism to mitigate that already?

You are right, it does, CONFIG_RCU_BOOST=y.

> > Essentially this means preemption is now more aggressive from the point of view
> > of a preempt=none user. I was suggesting that, a point of view could be RCU
> > should always support preepmtiblity (don't give PREEEMPT_RCU=n option) because
> > AUTO *does preempt* unlike classic CONFIG_PREEMPT_NONE. Otherwise it is
> > inconsistent -- say with CONFIG_PREEMPT=y (another *preemption mode*) which
> > forces CONFIG_PREEMPT_RCU. However to Paul's point, we need to worry about those
> > users who are concerned with running out of memory due to reader
> > preemption.
> 
> What's wrong with the combination of PREEMPT_AUTO=y and PREEMPT_RCU=n?
> Paul and me agreed long ago that this needs to be supported.
> 
> > In that vain, maybe we should also support CONFIG_PREEMPT_RCU=n for
> > CONFIG_PREEMPT=y as well. There are plenty of popular systems with relatively
> > low memory that need low latency (like some low-end devices / laptops
> > :-)).
> 
> I'm not sure whether that's useful as the goal is to get rid of all the
> CONFIG_PREEMPT_FOO options, no?
> 
> I'd rather spend brain cycles on figuring out whether RCU can be flipped
> over between PREEMPT_RCU=n/y at boot or obviously run-time.

Well, it is just software, so anything is possible.  But there can
be a wide gap between "possible" and "sensible".  ;-)

In theory, one boot-time approach would be build preemptible RCU,
and then to boot-time binary-rewrite calls to __rcu_read_lock()
and __rcu_read_unlock() to preempt_disable() and preempt_enable(),
respectively.  Because preemptible RCU has to treat preemption-disabled
regions of code as RCU readers, this Should Just Work.  However, there
would then be a lot of needless branches in the grace-period code.
Only the ones on fastpaths (for example, context switch) would need
to be static-branchified, but there would likely need to be other
restructuring, given the need for current preemptible RCU to do a better
job of emulating non-preemptible RCU.  (Emulating non-preemptible RCU
is of course currently a complete non-goal for preemptible RCU.)

So maybe?

But this one needs careful design and review up front, as in step through
all the code and check assumptions and changes in behavior.  After all,
this stuff is way easier to break than to debug and fix.  ;-)


On the other hand, making RCU switch at runtime is...  Tricky.

For example, if the system was in non-preemptible mode at rcu_read_lock()
time, the corresponding rcu_read_unlock() needs to be aware that it needs
to act as if the system was still in non-preemptible mode, and vice versa.
Grace period processing during the switch needs to be aware that different
CPUs will be switching at different times.  Also, it will be common for a
given CPU's switch to span more than one grace period.  So any approach
based on either binary rewrite or static branches will need to be set
up in a multi-phase multi-grace-period state machine.  Sort of like
Frederic's runtime-switched callback offloading, but rather more complex,
and way more performance sensitive.

But do we even need to switch RCU at runtime, other than to say that
we did it?  What is the use case?  Or is this just a case of "it would
be cool!"?  Don't get me wrong, I am a sucker for "it would be cool",
as you well know, but even for me there are limits.  ;-)

At the moment, I would prioritize improving quiescent-state forcing for
existing RCU over this, especially perhaps given the concerns from the
MM folks.

But what is motivating the desire to boot-time/run-time switch RCU
between preemptible and non-preemptible?

							Thanx, Paul