linux-kernel - Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <53020731-e9a9-4561-97db-8848c78172c7@paulmck-laptop>
Date: Wed, 21 Feb 2024 12:22:35 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Steven Rostedt <rostedt@...dmis.org>
Cc: Ankur Arora <ankur.a.arora@...cle.com>, linux-kernel@...r.kernel.org,
	tglx@...utronix.de, peterz@...radead.org,
	torvalds@...ux-foundation.org, akpm@...ux-foundation.org,
	luto@...nel.org, bp@...en8.de, dave.hansen@...ux.intel.com,
	hpa@...or.com, mingo@...hat.com, juri.lelli@...hat.com,
	vincent.guittot@...aro.org, willy@...radead.org, mgorman@...e.de,
	jpoimboe@...nel.org, mark.rutland@....com, jgross@...e.com,
	andrew.cooper3@...rix.com, bristot@...nel.org,
	mathieu.desnoyers@...icios.com, glaubitz@...sik.fu-berlin.de,
	anton.ivanov@...bridgegreys.com, mattst88@...il.com,
	krypton@...ich-teichert.org, David.Laight@...lab.com,
	richard@....at, jon.grimm@....com, bharata@....com,
	boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Re: [PATCH 00/30] PREEMPT_AUTO: support lazy rescheduling

On Wed, Feb 21, 2024 at 03:11:57PM -0500, Steven Rostedt wrote:
> On Wed, 21 Feb 2024 11:41:47 -0800
> "Paul E. McKenney" <paulmck@...nel.org> wrote:
> 
> > > I wonder if we can just see if the instruction pointer at preemption is at
> > > something that was allocated? That is, if it __is_kernel(addr) returns
> > > false, then we need to do more work. Of course that means modules will also
> > > trigger this. We could check __is_module_text() but that does a bit more
> > > work and may cause too much overhead. But who knows, if the module check is
> > > only done if the __is_kernel() check fails, maybe it's not that bad.  
> > 
> > I do like very much that idea, but it requires that we be able to identify
> > this instruction pointer perfectly, no matter what.  It might also require
> > that we be able to perfectly identify any IRQ return addresses as well,
> > for example, if the preemption was triggered within an interrupt handler.
> > And interrupts from softirq environments might require identifying an
> > additional level of IRQ return address.  The original IRQ might have
> > interrupted a trampoline, and then after transitioning into softirq,
> > another IRQ might also interrupt a trampoline, and this last IRQ handler
> > might have instigated a preemption.
> 
> Note, softirqs still require a real interrupt to happen in order to preempt
> executing code. Otherwise it should never be running from a trampoline.

Yes, the first interrupt interrupted a trampoline.  Then, on return,
that interrupt transitioned to softirq (as opposed to ksoftirqd).
While a softirq handler was executing within a trampoline, we got
another interrupt.  We thus have two interrupted trampolines.

Or am I missing something that prevents this?

> > Are there additional levels or mechanisms requiring identifying
> > return addresses?
> 
> Hmm, could we add to irq_enter_rcu()
> 
> 	__this_cpu_write(__rcu_ip, instruction_pointer(get_irq_regs()));
> 
> That is to save off were the ip was when it was interrupted.
> 
> Hmm, but it looks like the get_irq_regs() is set up outside of
> irq_enter_rcu() :-(
> 
> I wonder how hard it would be to change all the architectures to pass in
> pt_regs to irq_enter_rcu()? All the places it is called, the regs should be
> available.
> 
> Either way, it looks like it will be a bit of work around the trampoline or
> around RCU to get this efficiently done.

One approach would be to make Tasks RCU be present for PREEMPT_AUTO
kernels as well as PREEMPTIBLE kernels, and then, as architectures provide
the needed return-address infrastructure, transition those architectures
to something more precise.

Or maybe the workaround will prove to be good enough.  We did
inadvertently test it for a year or so at scale, so I am at least
confident that it works.  ;-)

							Thanx, Paul