lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date: Fri, 1 Mar 2024 17:16:33 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Mark Rutland <mark.rutland@....com>
Cc: Steven Rostedt <rostedt@...dmis.org>,
	Ankur Arora <ankur.a.arora@...cle.com>,
	linux-kernel@...r.kernel.org, tglx@...utronix.de,
	peterz@...radead.org, torvalds@...ux-foundation.org,
	akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
	dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
	juri.lelli@...hat.com, vincent.guittot@...aro.org,
	willy@...radead.org, mgorman@...e.de, jpoimboe@...nel.org,
	jgross@...e.com, andrew.cooper3@...rix.com, bristot@...nel.org,
	mathieu.desnoyers@...icios.com, glaubitz@...sik.fu-berlin.de,
	anton.ivanov@...bridgegreys.com, mattst88@...il.com,
	krypton@...ich-teichert.org, David.Laight@...lab.com,
	richard@....at, jon.grimm@....com, bharata@....com,
	boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling

On Fri, Feb 23, 2024 at 07:31:50AM -0800, Paul E. McKenney wrote:
> On Fri, Feb 23, 2024 at 11:05:45AM +0000, Mark Rutland wrote:
> > On Thu, Feb 22, 2024 at 11:11:34AM -0800, Paul E. McKenney wrote:
> > > On Thu, Feb 22, 2024 at 03:50:02PM +0000, Mark Rutland wrote:
> > > > On Wed, Feb 21, 2024 at 12:22:35PM -0800, Paul E. McKenney wrote:
> > > > > On Wed, Feb 21, 2024 at 03:11:57PM -0500, Steven Rostedt wrote:
> > > > > > On Wed, 21 Feb 2024 11:41:47 -0800
> > > > > > "Paul E. McKenney" <paulmck@...nel.org> wrote:
> > > > > > 
> > > > > > > > I wonder if we can just see if the instruction pointer at preemption is at
> > > > > > > > something that was allocated? That is, if it __is_kernel(addr) returns
> > > > > > > > false, then we need to do more work. Of course that means modules will also
> > > > > > > > trigger this. We could check __is_module_text() but that does a bit more
> > > > > > > > work and may cause too much overhead. But who knows, if the module check is
> > > > > > > > only done if the __is_kernel() check fails, maybe it's not that bad.  
> > > > > > > 
> > > > > > > I do like very much that idea, but it requires that we be able to identify
> > > > > > > this instruction pointer perfectly, no matter what.  It might also require
> > > > > > > that we be able to perfectly identify any IRQ return addresses as well,
> > > > > > > for example, if the preemption was triggered within an interrupt handler.
> > > > > > > And interrupts from softirq environments might require identifying an
> > > > > > > additional level of IRQ return address.  The original IRQ might have
> > > > > > > interrupted a trampoline, and then after transitioning into softirq,
> > > > > > > another IRQ might also interrupt a trampoline, and this last IRQ handler
> > > > > > > might have instigated a preemption.
> > > > > > 
> > > > > > Note, softirqs still require a real interrupt to happen in order to preempt
> > > > > > executing code. Otherwise it should never be running from a trampoline.
> > > > > 
> > > > > Yes, the first interrupt interrupted a trampoline.  Then, on return,
> > > > > that interrupt transitioned to softirq (as opposed to ksoftirqd).
> > > > > While a softirq handler was executing within a trampoline, we got
> > > > > another interrupt.  We thus have two interrupted trampolines.
> > > > > 
> > > > > Or am I missing something that prevents this?
> > > > 
> > > > Surely the problematic case is where the first interrupt is taken from a
> > > > trampoline, but the inner interrupt is taken from not-a-trampoline? If the
> > > > innermost interrupt context is a trampoline, that's the same as that without
> > > > any nesting.
> > > 
> > > It depends.  If we wait for each task to not have a trampoline in effect
> > > then yes, we only need to know whether or not a given task has at least
> > > one trampoline in use.  One concern with this approach is that a given
> > > task might have at least one trampoline in effect every time it is
> > > checked, unlikely though that might seem.
> > > 
> > > If this is a problem, one way around it is to instead ask whether the
> > > current task still has a reference to one of a set of trampolines that
> > > has recently been removed.  This avoids the problem of a task always
> > > being one some trampoline or another, but requires exact identification
> > > of any and all trampolines a given task is currently using.
> > >
> > > Either way, we need some way of determining whether or not a given
> > > PC value resides in a trampoline.  This likely requires some data
> > > structure (hash table?  tree?  something else?) that must be traversed
> > > in order to carry out that determination.  Depending on the traversal
> > > overhead, it might (or might not) be necessary to make sure that the
> > > traversal is not on the entry/exit/scheduler fast paths.  It is also
> > > necessary to keep the trampoline-use overhead low and the trampoline
> > > call points small.
> > 
> > Thanks; I hadn't thought about that shape of livelock problem; with that in
> > mind my suggestion using flags was inadequate.
> > 
> > I'm definitely in favour of just using Tasks RCU! That's what arm64 does today,
> > anyhow!
> 
> Full speed ahead, then!!!  But if you come up with a nicer solution,
> please do not keep it a secret!

The networking NAPI code ends up needing special help to avoid starving
Tasks RCU grace periods [1].  I am therefore revisiting trying to make
Tasks RCU directly detect trampoline usage, but without quite as much
need to identify specific trampolines...

I am putting this information in a Google document for future
reference [2].

Thoughts?

								Thanx, Paul

[1] https://lore.kernel.org/all/Zd4DXTyCf17lcTfq@debian.debian/
[2] https://docs.google.com/document/d/1kZY6AX-AHRIyYQsvUX6WJxS1LsDK4JA2CHuBnpkrR_U/edit?usp=sharing

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ