linux-kernel - Re: [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20190805080531.GH2349@hirez.programming.kicks-ass.net>
Date:   Mon, 5 Aug 2019 10:05:31 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     "Paul E. McKenney" <paulmck@...ux.ibm.com>
Cc:     rcu@...r.kernel.org, linux-kernel@...r.kernel.org,
        mingo@...nel.org, jiangshanlai@...il.com, dipankar@...ibm.com,
        akpm@...ux-foundation.org, mathieu.desnoyers@...icios.com,
        josh@...htriplett.org, tglx@...utronix.de, rostedt@...dmis.org,
        dhowells@...hat.com, edumazet@...gle.com, fweisbec@...il.com,
        oleg@...hat.com, joel@...lfernandes.org
Subject: Re: [PATCH RFC tip/core/rcu 14/14] rcu/nohz: Make multi_cpu_stop()
 enable tick on all online CPUs

On Sun, Aug 04, 2019 at 11:41:59AM -0700, Paul E. McKenney wrote:
> On Sun, Aug 04, 2019 at 04:48:35PM +0200, Peter Zijlstra wrote:
> > On Sun, Aug 04, 2019 at 04:43:17PM +0200, Peter Zijlstra wrote:
> > > On Fri, Aug 02, 2019 at 08:15:01AM -0700, Paul E. McKenney wrote:
> > > > The multi_cpu_stop() function relies on the scheduler to gain control from
> > > > whatever is running on the various online CPUs, including any nohz_full
> > > > CPUs running long loops in kernel-mode code.  Lack of the scheduler-clock
> > > > interrupt on such CPUs can delay multi_cpu_stop() for several minutes
> > > > and can also result in RCU CPU stall warnings.  This commit therefore
> > > > causes multi_cpu_stop() to enable the scheduler-clock interrupt on all
> > > > online CPUs.
> > > 
> > > This sounds wrong; should we be fixing sched_can_stop_tick() instead to
> > > return false when the stop task is runnable?
> 
> Agreed.  However, it is proving surprisingly hard to come up with a
> code sequence that has the effect of rcu_nocb without nohz_full.
> And rcu_nocb works just fine.  With nohz_full also in place, I am
> decreasing the failure rate, but it still fails, perhaps a few times
> per hour of TREE04 rcutorture on an eight-CPU system.  (My 12-CPU
> system stubbornly refuses to fail.  Good thing I kept the eight-CPU
> system around, I guess.)
> 
> When I arrive at some sequence of actions that actually work reliably,
> then by all means let's put it somewhere in the NO_HZ_FULL machinery!

I'm confused; what are you arguing? The patch as proposed is just wrong,
it needs to go.

> > And even without that; I don't understand how we're not instantly
> > preempted the moment we enqueue the stop task.
> 
> There is no preemption because CONFIG_PREEMPT=n for the scenarios still

That doesn't make sense; even with CONFIG_PREEMPT=n we set
TIF_NEED_RESCHED. We'll just not react to it as promptly (only explicit
rescheduling points and return to userspace). Enabling the tick will not
make any difference what so ever.

Tick based preemption will not 'fix' the lack of wakeup preemption. If
the stop task wakeup didn't set TIF_NEED_RESCHED, the OTHER/CFS tick
will not either.

> having trouble.  Yes, there are cond_resched() calls, but they don't do
> anything unless the appropriate flags are set, which won't always happen
> without the tick, apparently.  Or without -something- that isn't always
> happening as it should.

Right; so clearly we're not understanding what's happening. That seems
like a requirement for actually doing a patch.

> > Any enqueue, should go through check_preempt_curr() which will be an
> > instant resched_curr() when we just woke the stop class.
> 
> I did try hitting all of the CPUs with resched_cpu().  Ten times on each
> CPU with a ten-jiffy wait between each.  This might have decreased the
> probability of excessively long CPU-stopper waits by a factor of two or
> three, but it did not eliminate the excessively long waits.
> 
> What else should I try?
> 
> For example, are there any diagnostics I could collect, say from within
> the CPU stopper when things are taking too long?  I see CPU-stopper
> delays in excess of five -minutes-, so this is anything but subtle.

Catch the whole thing in a function trace?

The chain that should instantly set TIF_NEED_RESCHED:

  stop_machine()
    stop_machine_cpuslocked()
      stop_cpus()
        __stop_cpus()
          queue_stop_cpus_work()
            cpu_stop_queue_work()
	      wake_up_q()
	        wake_up_process()


  wake_up_process()
    try_to_wake_up()
      ttwu_queue()
        ttwu_queue_remote()
	  <- scheduler_ipi()
	    sched_ttwu_pending()
	      ttwu_do_activate()

        ttwu_do_activate()
	  activate_task()
	  ttwu_do_wakeup()
	    check_preempt_curr()
	      resched_curr()

You could frob some tracing into __stop_cpus(), before
wait_for_completion(), at that point all the CPUs in @cpumask should
either be running the stop task or have TIF_NEED_RESCHED set.