lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220311172844.GJ4285@paulmck-ThinkPad-P17-Gen-1>
Date:   Fri, 11 Mar 2022 09:28:44 -0800
From:   "Paul E. McKenney" <paulmck@...nel.org>
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     LKML <linux-kernel@...r.kernel.org>,
        Peter Zijlstra <peterz@...radead.org>,
        Phil Auld <pauld@...hat.com>,
        Alex Belits <abelits@...vell.com>,
        Nicolas Saenz Julienne <nsaenz@...nel.org>,
        Xiongfeng Wang <wangxiongfeng2@...wei.com>,
        Neeraj Upadhyay <quic_neeraju@...cinc.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Yu Liao <liaoyu15@...wei.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Marcelo Tosatti <mtosatti@...hat.com>,
        Paul Gortmaker <paul.gortmaker@...driver.com>,
        Uladzislau Rezki <uladzislau.rezki@...y.com>,
        Joel Fernandes <joel@...lfernandes.org>
Subject: Re: [PATCH 18/19] rcu/context_tracking: Merge dynticks counter and
 context tracking states

On Fri, Mar 11, 2022 at 05:35:25PM +0100, Frederic Weisbecker wrote:
> On Thu, Mar 10, 2022 at 12:32:22PM -0800, Paul E. McKenney wrote:
> > On Wed, Mar 02, 2022 at 04:48:09PM +0100, Frederic Weisbecker wrote:
> > > Updating the context tracking state and the RCU dynticks counter
> > > atomically in a single operation is a first step towards improving CPU
> > > isolation. This makes the context tracking state updates fully ordered
> > > and therefore allow for later enhancements such as postponing some work
> > > while a task is running isolated in userspace until it ever comes back
> > > to the kernel.
> > > 
> > > The state field becomes divided in two parts:
> > > 
> > > 1) Lower bits for context tracking state:
> > > 
> > >    	CONTEXT_IDLE = 1,
> > > 	CONTEXT_USER = 2,
> > > 	CONTEXT_GUEST = 4,
> > 
> > And the CONTEXT_DISABLED value of -1 works because you can have only
> > one of the above three bits set at a time?
> > 
> > Except that RCU needs this to unconditionally at least distinguish
> > between kernel and idle, given the prevalence of CONFIG_NO_HZ_IDLE=y.
> > So does the CONTEXT_DISABLED really happen anymore?
> > 
> > A few more questions interspersed below.
> 
> The value of CONTEXT_DISABLED is never stored in the ct->state. It is just
> returned as is when CONTEXT_TRACKING is disabled. So this shouldn't conflict
> with RCU.

Whew!  ;-)

> > > @@ -452,15 +453,16 @@ void noinstr __ct_user_exit(enum ctx_state state)
> > >  			 * Exit RCU idle mode while entering the kernel because it can
> > >  			 * run a RCU read side critical section anytime.
> > >  			 */
> > > -			rcu_eqs_exit(true);
> > > +			ct_kernel_enter(true, RCU_DYNTICKS_IDX - state);
> > >  			if (state == CONTEXT_USER) {
> > >  				instrumentation_begin();
> > >  				vtime_user_exit(current);
> > >  				trace_user_exit(0);
> > >  				instrumentation_end();
> > >  			}
> > > +		} else {
> > > +			atomic_sub(state, &ct->state);
> > 
> > OK, atomic_sub() got my attention.  What is going on here?  ;-)
> 
> Right :-)
> 
> So that's when context tracking user is running but RCU doesn't
> track user. This is for example when NO_HZ_FULL=n but VIRT_CPU_ACCOUNTING_GEN=y.
> 
> I might remove that standalone VIRT_CPU_ACCOUNTING_GEN=y one day but for now
> it's there.
> 
> Anyway so in this case we only want to track KERNEL <-> USER from context
> tracking POV, but we don't need the DYNTICKS_RCU_IDX part, hence the spared
> ordering.
> 
> But it still needs to be atomic because NMIs may increase DYNTICKS_RCU_IDX on
> the same field.

OK, so the idea is because NO_HZ_FULL=n, RCU doesn't care about user
space execution?

How about looking at it the other way?  Is there some reason that RCU
shouldn't take advantage of the userspace-execution information when it
exists?  For example, in the NO_HZ_FULL=n but VIRT_CPU_ACCOUNTING_GEN=y
case, is there some chance that RCU would be ignoring a non-noinstr
function?

> > > @@ -548,7 +550,7 @@ EXPORT_SYMBOL_GPL(context_tracking);
> > >  void ct_idle_enter(void)
> > >  {
> > >  	lockdep_assert_irqs_disabled();
> > > -	rcu_eqs_enter(false);
> > > +	ct_kernel_exit(false, RCU_DYNTICKS_IDX + CONTEXT_IDLE);
> > >  }
> > >  EXPORT_SYMBOL_GPL(ct_idle_enter);
> > >  
> > > @@ -566,7 +568,7 @@ void ct_idle_exit(void)
> > >  	unsigned long flags;
> > >  
> > >  	local_irq_save(flags);
> > > -	rcu_eqs_exit(false);
> > > +	ct_kernel_enter(false, RCU_DYNTICKS_IDX - CONTEXT_IDLE);
> > 
> > Nice!  This works because all transitions must be either from or
> > to kernel context, correct?
> 
> Exactly. There is no such thing as IDLE -> USER -> GUEST, etc...
> There has to be KERNEL in the middle of each. Because we never
> call rcu_idle_enter() -> rcu_user_enter() for example. The has to be
> rcu_idle_exit() in the middle.
> 
> (famous last words).

Works for me, for the moment, anyway.  ;-)

> > >  /* Return true if the specified CPU is currently idle from an RCU viewpoint.  */
> > > @@ -321,8 +321,7 @@ bool rcu_dynticks_zero_in_eqs(int cpu, int *vp)
> > >  	int snap;
> > >  
> > >  	// If not quiescent, force back to earlier extended quiescent state.
> > > -	snap = ct_dynticks_cpu(cpu) & ~0x1;
> > > -
> > > +	snap = ct_dynticks_cpu(cpu) & ~RCU_DYNTICKS_IDX;
> > 
> > Do we also need to get rid of the low-order bits?  Or is that happening
> > elsewhere?  Or is there some reason that they can stick around?
> 
> Yep, ct_dynticks_cpu() clears the low order CONTEXT_* bits.

Whew!  ;-)

> > > diff --git a/kernel/rcu/tree_stall.h b/kernel/rcu/tree_stall.h
> > > index 9bf5cc79d5eb..1ac48c804006 100644
> > > --- a/kernel/rcu/tree_stall.h
> > > +++ b/kernel/rcu/tree_stall.h
> > > @@ -459,7 +459,7 @@ static void print_cpu_stall_info(int cpu)
> > >  			rdp->rcu_iw_pending ? (int)min(delta, 9UL) + '0' :
> > >  				"!."[!delta],
> > >  	       ticks_value, ticks_title,
> > > -	       rcu_dynticks_snap(cpu) & 0xfff,
> > > +	       (rcu_dynticks_snap(cpu) >> RCU_DYNTICKS_SHIFT) & 0xfff ,
> > 
> > Actually, the low-ordder several bits are useful when debugging, so
> > could you please not shift them away?  Maybe also go to 0xffff to allow
> > for more bits taken?
> 
> Yeah that makes sense, I'll change that.
> 
> Thanks a lot for the reviews!

Thank you for the series!

							Thanx, Paul

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ