lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 20 Apr 2022 11:37:26 -0700
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
        kernel-team@...com
Subject: Re: [PATCH rcu 1/2] docs: Add documentation for rude and trace RCU
 flavors

On Wed, Apr 20, 2022 at 02:02:26PM -0400, Steven Rostedt wrote:
> On Wed, 20 Apr 2022 09:48:47 -0700
> "Paul E. McKenney" <paulmck@...nel.org> wrote:
> 
> > > > > If NOHZ_FULL is enabled, is there a way to also be able to have this full
> > > > > mb on RT removed as well?  
> > 
> > Ah, I did miss this question, apologies.  The tradeoff is IPIs during
> > each Tasks Trace RCU grace period on the one hand or the read-side
> > memory barrier on the other.
> > 
> > CONFIG_TASKS_TRACE_RCU_READ_MB=y gets you the read-side memory barriers.
> > 
> > CONFIG_TASKS_TRACE_RCU_READ_MB=n gets you the IPIs.
> > 
> > Choose wisely!  ;-)
> 
> Yes, I figured this part.

Very good!  ;-)

> > More seriously, I could easily imagine an RT system being set up so that
> > Tasks Trace RCU grace periods never happen while the real-time application
> > is running.  This requires the system administrator being careful what
> > tracing facilities are used that those application is running, but
> > it seems doable to me.
> 
> Not something I would want to put onto the system administrator.

Indeed, I could imagine cases where this restriction would not be welcome.

> > Such an RT system could build with CONFIG_TASKS_TRACE_RCU_READ_MB=n to
> > avoid the read-side memory barriers, but also avoiding the IPIs while
> > the application was running.
> > 
> > Even more seriously, if the real-time application runs in nohz_full mode,
> > Tasks Trace RCU will avoid IPIing it.  In that case, the kernel can be
> > built with CONFIG_TASKS_TRACE_RCU_READ_MB=n and avoid both the read-side
> > memory barriers and the IPIs.
> 
> Is this currently the case?

Yes.

There are of course races where the IPI might be sent due to some task
other than the application running, but where the IPI arrives after
that task is done and the application is once again running in nohz_full
userspace mode, but once the application is going, no IPIs will be sent.

> > And the final bit of seriousness for this email, if your real-time
> > application didn't have a time-critical CPU-bound component, it might
> > be possible to avoid both read-side memory barriers and IPIs by
> > adjusting the rcu_tasks_trace_qs() code in the context-switch hook.
> > 
> > > Hmm, if we no longer need the rude version due to noinstr, if then we need
> > > to use something that adds full memory barriers at *every function call*
> > > then I rather keep the rude version.  
> > 
> > A full memory barrier at every function call does sound more heavy weight
> > than would be good.  ;-)
> 
> Hmm, I just realized that the function tracer can not use any "reader side"
> tracing. Thus, I wonder if we can modify the rude side to be a bit less
> rude? That is, what can be changed if the reader always happens inside an
> "RCU is watching" location but still has the requirement that it can not
> tell RCU it started "reading" and allows preemption?

This was the original motivation for Tasks RCU, in which voluntary context
switches (but not preemptions) are quiescent states.  But this does not
send IPIs because it instead relies on the memory ordering provided by
the scheduler.

This flavor of RCU does not care about whether or not RCU is watching,
but it does ignore idle tasks, which no not necessarily ever do voluntary
context switches, for example, when a given CPU remains idle for a long
time.

But the comments on schedule_idle() look like they were set up to
try to do something about this.  I don't see how they do without
that CPU coming out of idle, though.  So what am I missing?

> The issue with function tracing is that the "read side" starts at the
> location that calls the trampoline (aka fentry or mcount call). Where it's
> either a nop or a call to the trampoline. To free the trampoline, we would
> still need to wait for all locations watched by RCU to schedule. Would it
> still be rude to do so? That is, we do not need to worry about idle tasks
> nor NOHZ_FULL tasks.

The original purpose of RCU Tasks Rude was to deal with the idle tasks,
given that RCU Tasks dealt only with the non-idle tasks.

Or is there a trick that I missed?

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ