linux-kernel - Re: [PATCH tip/core/rcu 2/2] rcu: Check for wakeup-safe conditions in rcu_read_unlock

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190402070953.GG12232@hirez.programming.kicks-ass.net>
Date:   Tue, 2 Apr 2019 09:09:53 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Paul E. McKenney" <paulmck@...ux.ibm.com>
Cc:     rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
        mingo@...nel.org, jiangshanlai@...il.com, dipankar@...ibm.com,
        akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
        josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
        dhowells@...hat.com, edumazet@...gle.com, fweisbec@...il.com,
        oleg@...hat.com, joel@...lfernandes.org,
        Sebastian Andrzej Siewior <bigeasy@...utronix.de>
Subject: Re: [PATCH tip/core/rcu 2/2] rcu: Check for wakeup-safe conditions
 in rcu_read_unlock_special()

On Mon, Apr 01, 2019 at 10:22:57AM -0700, Paul E. McKenney wrote:
> > > The initial solution to this problem was to use set_tsk_need_resched() and
> > > set_preempt_need_resched() to force a future context switch, which allows
> > > rcu_preempt_note_context_switch() to report the deferred quiescent state
> > > to RCU's core processing.  Unfortunately for expedited grace periods,
> > > there can be a significant delay between the call for a context switch
> > > and the actual context switch.
> > 
> > This is all PREEMPT=y kernels, right? Where is the latency coming from?
> > Because PREEMPT=y _should_ react quite quickly.
> 
> Yes, PREEMPT=y.  It happens like this:
> 
> 1.	rcu_read_lock() with everything enabled.
> 
> 2.	Preemption then resumption.
> 
> 3.	local_irq_disable().
> 
> 4.	rcu_read_unlock().
> 
> 5.	local_irq_enable().
> 
> From what I know, the scheduler doesn't see anything until the next
> interrupt, local_bh_enable(), return to userspace, etc.  Because this
> is PREEMPT=y, preempt_enable() and cond_resched() do nothing.  So
> it could be some time (milliseconds, depending on HZ, NO_HZ_FULL, and
> so on) until the scheduler responds.  With NO_HZ_FULL, last I knew,
> the delay can be extremely long.
> 
> Or am I missing something that gets the scheduler on the job faster?

Oh urgh, yah. So normally we only twiddle with the need_resched state:

 - while preempt_disabl(), such that preempt_enable() will reschedule
 - from interrupt context, such that interrupt return will reschedule

But the usage here 'violates' those rules and then there is an
unspecified latency between setting the state and it getting observed,
but no longer than 1 tick I would think.

I don't think we can go NOHZ with need_resched set, because the moment
we hit the idle loop with that set, we _will_ reschedule.

So in that respect the irq_work suggestion I made would fix things
properly.