linux-kernel - Re: [PATCH] rcu: Make jiffies_till_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190715173924.GA2494@google.com>
Date:   Mon, 15 Jul 2019 13:39:24 -0400
From:   Joel Fernandes <joel@...lfernandes.org>
To:     "Paul E. McKenney" <paulmck@...ux.ibm.com>
Cc:     Byungchul Park <max.byungchul.park@...il.com>,
        Byungchul Park <byungchul.park@....com>,
        Josh Triplett <josh@...htriplett.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Lai Jiangshan <jiangshanlai@...il.com>,
        rcu <rcu@...r.kernel.org>, LKML <linux-kernel@...r.kernel.org>,
        kernel-team@....com
Subject: Re: [PATCH] rcu: Make jiffies_till_sched_qs writable

On Sun, Jul 14, 2019 at 06:56:10AM -0700, Paul E. McKenney wrote:
[snip]
> > > > > > > > >
> > > > > > > > >       commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7
> > > > > > > > >       rcu: Remove debugfs tracing
> > > > > > > > >
> > > > > > > > > removed all debugfs tracing, gp_max also included.
> > > > > > > > >
> > > > > > > > > And you sounds great. And even looks not that hard to add it like,
> > > > > > > > >
> > > > > > > > > :)
> > > > > > > > >
> > > > > > > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > > > > > > index ad9dc86..86095ff 100644
> > > > > > > > > --- a/kernel/rcu/tree.c
> > > > > > > > > +++ b/kernel/rcu/tree.c
> > > > > > > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void)
> > > > > > > > >       raw_spin_lock_irq_rcu_node(rnp);
> > > > > > > > >       rcu_state.gp_end = jiffies;
> > > > > > > > >       gp_duration = rcu_state.gp_end - rcu_state.gp_start;
> > > > > > > > > -     if (gp_duration > rcu_state.gp_max)
> > > > > > > > > +     if (gp_duration > rcu_state.gp_max) {
> > > > > > > > >               rcu_state.gp_max = gp_duration;
> > > > > > > > > +             trace_rcu_grace_period(something something);
> > > > > > > > > +     }
> > > > > > > >
> > > > > > > > Yes, that makes sense. But I think it is much better off as a readable value
> > > > > > > > from a virtual fs. The drawback of tracing for this sort of thing are:
> > > > > > > >  - Tracing will only catch it if tracing is on
> > > > > > > >  - Tracing data can be lost if too many events, then no one has a clue what
> > > > > > > >    the max gp time is.
> > > > > > > >  - The data is already available in rcu_state::gp_max so copying it into the
> > > > > > > >    trace buffer seems a bit pointless IMHO
> > > > > > > >  - It is a lot easier on ones eyes to process a single counter than process
> > > > > > > >    heaps of traces.
> > > > > > > >
> > > > > > > > I think a minimal set of RCU counters exposed to /proc or /sys should not
> > > > > > > > hurt and could do more good than not. The scheduler already does this for
> > > > > > > > scheduler statistics. I have seen Peter complain a lot about new tracepoints
> > > > > > > > but not much (or never) about new statistics.
> > > > > > > >
> > > > > > > > Tracing has its strengths but may not apply well here IMO. I think a counter
> > > > > > > > like this could be useful for tuning of things like the jiffies_*_sched_qs,
> > > > > > > > the stall timeouts and also any other RCU knobs. What do you think?
> > > > > > >
> > > > > > > I prefer proc/sys knob for it to tracepoint. Why I've considered it is just it
> > > > > > > looks like undoing what Paul did at ae91aa0ad.
> > > > > > >
> > > > > > > I think you're rational enough. I just wondered how Paul think of it.
> > > > > >
> > > > > > I believe at least initially, a set of statistics can be made
> > > > > > available only when rcutorture or rcuperf module is loaded. That way
> > > > > > they are purely only for debugging and nothing needs to be exposed to
> > > > > > normal kernels distributed thus reducing testability concerns.
> > > > > >
> > > > > > rcu_state::gp_max would be trivial to expose through this, but for
> > > > > > other statistics that are more complicated - perhaps
> > > > > > tracepoint_probe_register can be used to add hooks on to the
> > > > > > tracepoints and generate statistics from them. Again the registration
> > > > > > of the probe and the probe handler itself would all be in
> > > > > > rcutorture/rcuperf test code and not a part of the kernel proper.
> > > > > > Thoughts?
> > > > >
> > > > > It still feels like you guys are hyperfocusing on this one particular
> > > > > knob.  I instead need you to look at the interrelating knobs as a group.
> > > >
> > > > Thanks for the hints, we'll do that.
> > > >
> > > > > On the debugging side, suppose someone gives you an RCU bug report.
> > > > > What information will you need?  How can you best get that information
> > > > > without excessive numbers of over-and-back interactions with the guy
> > > > > reporting the bug?  As part of this last question, what information is
> > > > > normally supplied with the bug?  Alternatively, what information are
> > > > > bug reporters normally expected to provide when asked?
> > > >
> > > > I suppose I could dig out some of our Android bug reports of the past where
> > > > there were RCU issues but if there's any fires you are currently fighting do
> > > > send it our way as debugging homework ;-)
> > >
> > > Evading the questions, are we?

Sorry if I sounded like evading, I was just saying that it would be nice to
work on some specific issue or bug report and then figure out what is needed
from there, instead of trying to predict what data is useful. Of course, I
think predicting and dumping data is also useful, but I was just wondering if
you had some specific issue in mind that we could work off of (you did
mention the CPU stopper below so thank you!)

> > > OK, I can be flexible.  Suppose that you were getting RCU CPU stall
> > > warnings featuring multi_cpu_stop() called from cpu_stopper_thread().
> > > Of course, this really means that some other CPU/task is holding up
> > > multi_cpu_stop() without also blocking the current grace period.
> > >
> > > What is the best way to work out what is really holding things up?
> > 
> > Either the stopper preempted another being in a critical section and
> > has something wrong itself in case of PREEMPT or mechanisms for
> > urgent control doesn't work correctly.
> > 
> > I don't know what exactly you intended but I would check things like
> > (1) irq disable / eqs / tick / scheduler events and (2) whether special
> > handling for each level of qs urgency has started correctly. For that
> > purpose all the history of those events would be more useful.

Agreed, these are all good.

> > And with thinking it more, we could come up with a good way to
> > make use of those data to identify what the problem is. Do I catch
> > the point correctly? If so, me and Joel can start to work on it.
> > Otherwise, please correct me.
> 
> I believe you are on the right track.  In short, it would be great if
> the kernel would automatically dump out the needed information when
> cpu_stopper gets stalled, sort of like RCU does (much of the time,
> anyway) in its CPU stall warnings.  Given a patch that did this, I would
> be quite happy to help advocate for it!

In case you have a LKML link to a thread or bug report to this specific
cpu_stopper issue, please do pass it along.

Happy to work on this with Byungchul, thanks,

- Joel