linux-kernel - Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20240221131901.69c80c47@gandalf.local.home>
Date: Wed, 21 Feb 2024 13:19:01 -0500
From: Steven Rostedt <rostedt@...dmis.org>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Ankur Arora <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
 tglx@...utronix.de, peterz@...radead.org, torvalds@...ux-foundation.org,
 akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
 dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
 juri.lelli@...hat.com, vincent.guittot@...aro.org, willy@...radead.org,
 mgorman@...e.de, jpoimboe@...nel.org, mark.rutland@....com,
 jgross@...e.com, andrew.cooper3@...rix.com, bristot@...nel.org,
 mathieu.desnoyers@...icios.com, glaubitz@...sik.fu-berlin.de,
 anton.ivanov@...bridgegreys.com, mattst88@...il.com,
 krypton@...ich-teichert.org, David.Laight@...lab.com, richard@....at,
 jon.grimm@....com, bharata@....com, boris.ostrovsky@...cle.com,
 konrad.wilk@...cle.com
Subject: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling

On Mon, 19 Feb 2024 08:48:20 -0800
"Paul E. McKenney" <paulmck@...nel.org> wrote:

> > I will look again -- it is quite possible that I was confused by earlier
> > in-fleet setups that had Tasks RCU enabled even when preemption was
> > disabled.  (We don't do that anymore, and, had I been paying sufficient
> > attention, would not have been doing it to start with.  Back in the day,
> > enabling rcutorture, even as a module, had the side effect of enabling
> > Tasks RCU.  How else to test it, right?  Well...)  
> 
> OK, I got my head straight on this one...
> 
> And the problem is in fact that Tasks RCU isn't normally present
> in non-preemptible kernels.  This is because normal RCU will wait
> for preemption-disabled regions of code, and in PREMPT_NONE and
> PREEMPT_VOLUNTARY kernels, that includes pretty much any region of code
> lacking an explicit schedule() or similar.  And as I understand it,
> tracing trampolines rely on this implicit lack of preemption.
> 
> So, with lazy preemption, we could preempt in the middle of a
> trampoline, and synchronize_rcu() won't save us.
> 
> Steve and Mathieu will correct me if I am wrong.
> 
> If I do understand this correctly, one workaround is to remove the
> "if PREEMPTIBLE" on all occurrences of "select TASKS_RCU".  That way,
> all kernels would use synchronize_rcu_tasks(), which would wait for
> a voluntary context switch.
> 
> This workaround does increase the overhead and tracepoint-removal
> latency on non-preemptible kernels, so it might be time to revisit the
> synchronization of trampolines.  Unfortunately, the things I have come
> up with thus far have disadvantages:
> 
> o	Keep a set of permanent trampolines that enter and exit
> 	some sort of explicit RCU read-side critical section.
> 	If the address for this trampoline to call is in a register,
> 	then these permanent trampolines remain constant so that
> 	no synchronization of them is required.  The selected
> 	flavor of RCU can then be used to deal with the non-permanent
> 	trampolines.
> 
> 	The disadvantage here is a significant increase in the complexity
> 	and overhead of trampoline code and the code that invokes the
> 	trampolines.  This overhead limits where tracing may be used
> 	in the kernel, which is of course undesirable.

I wonder if we can just see if the instruction pointer at preemption is at
something that was allocated? That is, if it __is_kernel(addr) returns
false, then we need to do more work. Of course that means modules will also
trigger this. We could check __is_module_text() but that does a bit more
work and may cause too much overhead. But who knows, if the module check is
only done if the __is_kernel() check fails, maybe it's not that bad.

-- Steve

> 
> o	Check for being preempted within a trampoline, and track this
> 	within the tasks structure.  The disadvantage here is that this
> 	requires keeping track of all of the trampolines and adding a
> 	check for being in one on a scheduler fast path.
> 
> o	Have a variant of Tasks RCU which checks the stack of preempted
> 	tasks, waiting until all have been seen without being preempted
> 	in a trampoline.  This still requires keeping track of all the
> 	trampolines in an easy-to-search manner, but gets the overhead
> 	of searching off of the scheduler fastpaths.
> 
> 	It is also necessary to check running tasks, which might have
> 	been interrupted from within a trampoline.
> 
> 	I would have a hard time convincing myself that these return
> 	addresses were unconditionally reliable.  But maybe they are?
> 
> o	Your idea here!
> 
> Again, the short-term workaround is to remove the "if PREEMPTIBLE" from
> all of the "select TASKS_RCU" clauses.
> 
> > > > My next step is to try this on bare metal on a system configured as
> > > > is the fleet.  But good progress for a week!!!  
> > > 
> > > Yeah this is great. Fingers crossed for the wider set of tests.  
> > 
> > I got what might be a one-off when hitting rcutorture and KASAN harder.
> > I am running 320*TRACE01 to see if it reproduces.  
> 
> [ . . . ]
> 
> > So, first see if it is reproducible, second enable more diagnostics,
> > third make more grace-period sequence numbers available to rcutorture,
> > fourth recheck the diagnostics code, and then see where we go from there.
> > It might be that lazy preemption needs adjustment, or it might be that
> > it just tickled latent diagnostic issues in rcutorture.
> > 
> > (I rarely hit this WARN_ON() except in early development, when the
> > problem is usually glaringly obvious, hence all the uncertainty.)  
> 
> And it is eminently reproducible.  Digging into it...