lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Zfl6y-NLuwbmyyL8@FVFF77S0Q05N>
Date: Tue, 19 Mar 2024 11:45:15 +0000
From: Mark Rutland <mark.rutland@....com>
To: "Paul E. McKenney" <paulmck@...nel.org>
Cc: Steven Rostedt <rostedt@...dmis.org>,
	Ankur Arora <ankur.a.arora@...cle.com>,
	linux-kernel@...r.kernel.org, tglx@...utronix.de,
	peterz@...radead.org, torvalds@...ux-foundation.org,
	akpm@...ux-foundation.org, luto@...nel.org, bp@...en8.de,
	dave.hansen@...ux.intel.com, hpa@...or.com, mingo@...hat.com,
	juri.lelli@...hat.com, vincent.guittot@...aro.org,
	willy@...radead.org, mgorman@...e.de, jpoimboe@...nel.org,
	jgross@...e.com, andrew.cooper3@...rix.com, bristot@...nel.org,
	mathieu.desnoyers@...icios.com, glaubitz@...sik.fu-berlin.de,
	anton.ivanov@...bridgegreys.com, mattst88@...il.com,
	krypton@...ich-teichert.org, David.Laight@...lab.com,
	richard@....at, jon.grimm@....com, bharata@....com,
	boris.ostrovsky@...cle.com, konrad.wilk@...cle.com
Subject: Tasks RCU, ftrace, and trampolines (was: Re: [PATCH 00/30]
 PREEMPT_AUTO: support lazy rescheduling)

Hi Paul,

On Fri, Mar 01, 2024 at 05:16:33PM -0800, Paul E. McKenney wrote:
> The networking NAPI code ends up needing special help to avoid starving
> Tasks RCU grace periods [1].  I am therefore revisiting trying to make
> Tasks RCU directly detect trampoline usage, but without quite as much
> need to identify specific trampolines...
> 
> I am putting this information in a Google document for future
> reference [2].
> 
> Thoughts?

Sorry for the long delay! I've been looking into this general area over the
last couple of weeks due to the latent bugs I mentioned in:

  https://lore.kernel.org/lkml/Zenx_Q0UiwMbSAdP@FVFF77S0Q05N/

I was somewhat hoping that staring at the code for long enough would result in
an ephinany (and a nice simple-to-backport solution for the latent issues), but
so far that has eluded me.

I believe some of those cases will need to use synchronize_rcu_tasks() and we
might be able to make some structural changes to minimize the number of times
we'd need to synchronize (e.g. having static ftrace call ops->func from the ops
pointer, so we can switch ops+func atomically), but those look pretty invasive
so far.

I haven't been able to come up with "a precise and completely reliable way to
determine whether the current preemption occurred within a trampoline". Since
preemption might occur within a trampoline's callee that eventually returns
back to the trampoline, I believe that'll either depend on having a reliable
stacktrace or requiring the trampoline to dynamically register/unregister
somewhere around calling other functions. That, and we do also care about those
callees themselves, and it's not just about the trampolines...

On arm64, we kinda have "permanent trampolines", as our
DYNAMIC_FTRACE_WILL_CALL_OPS implementation uses a common trampoline. However,
that will tail-call direct functions (and those could also be directly called
from ftrace callsites), so we don't have a good way of handling those without a
change to the direct func calling convention.

I assume that permanent trampolines wouldn't be an option on architectures
where trampolines are a spectre mitigation.

Mark.

> 								Thanx, Paul
> 
> [1] https://lore.kernel.org/all/Zd4DXTyCf17lcTfq@debian.debian/
> [2] https://docs.google.com/document/d/1kZY6AX-AHRIyYQsvUX6WJxS1LsDK4JA2CHuBnpkrR_U/edit?usp=sharing

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ