linux-kernel - Re: [PATCH] rcu: Make jiffies_till_sched

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <20190712145338.GR26519@linux.ibm.com>
Date:   Fri, 12 Jul 2019 07:53:38 -0700
From:   "Paul E. McKenney" <paulmck@...ux.ibm.com>
To:     Joel Fernandes <joel@...lfernandes.org>
Cc:     Byungchul Park <byungchul.park@....com>, josh@...htriplett.org,
        rostedt@...dmis.org, mathieu.desnoyers@...icios.com,
        jiangshanlai@...il.com, rcu@...r.kernel.org,
        linux-kernel@...r.kernel.org, kernel-team@....com
Subject: Re: [PATCH] rcu: Make jiffies_till_sched_qs writable

On Fri, Jul 12, 2019 at 09:43:11AM -0400, Joel Fernandes wrote:
> On Fri, Jul 12, 2019 at 06:02:42AM -0700, Paul E. McKenney wrote:
> > On Fri, Jul 12, 2019 at 08:51:16AM -0400, Joel Fernandes wrote:
> > > On Fri, Jul 12, 2019 at 03:32:40PM +0900, Byungchul Park wrote:
> > > > On Thu, Jul 11, 2019 at 03:58:39PM -0400, Joel Fernandes wrote:
> > > > > Hmm, speaking of grace period durations, it seems to me the maximum grace
> > > > > period ever is recorded in rcu_state.gp_max. However it is not read from
> > > > > anywhere.
> > > > > 
> > > > > Any idea why it was added but not used?
> > > > > 
> > > > > I am interested in dumping this value just for fun, and seeing what I get.
> > > > > 
> > > > > I wonder also it is useful to dump it in rcutorture/rcuperf to find any
> > > > > issues, or even expose it in sys/proc fs to see what worst case grace periods
> > > > > look like.
> > > > 
> > > > Hi,
> > > > 
> > > > 	commit ae91aa0adb14dc33114d566feca2f7cb7a96b8b7
> > > > 	rcu: Remove debugfs tracing
> > > > 
> > > > removed all debugfs tracing, gp_max also included.
> > > > 
> > > > And you sounds great. And even looks not that hard to add it like,
> > > > 
> > > > :)
> > > > 
> > > > diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c
> > > > index ad9dc86..86095ff 100644
> > > > --- a/kernel/rcu/tree.c
> > > > +++ b/kernel/rcu/tree.c
> > > > @@ -1658,8 +1658,10 @@ static void rcu_gp_cleanup(void)
> > > >  	raw_spin_lock_irq_rcu_node(rnp);
> > > >  	rcu_state.gp_end = jiffies;
> > > >  	gp_duration = rcu_state.gp_end - rcu_state.gp_start;
> > > > -	if (gp_duration > rcu_state.gp_max)
> > > > +	if (gp_duration > rcu_state.gp_max) {
> > > >  		rcu_state.gp_max = gp_duration;
> > > > +		trace_rcu_grace_period(something something);
> > > > +	}
> > > 
> > > Yes, that makes sense. But I think it is much better off as a readable value
> > > from a virtual fs. The drawback of tracing for this sort of thing are:
> > >  - Tracing will only catch it if tracing is on
> > >  - Tracing data can be lost if too many events, then no one has a clue what
> > >    the max gp time is.
> > >  - The data is already available in rcu_state::gp_max so copying it into the
> > >    trace buffer seems a bit pointless IMHO
> > >  - It is a lot easier on ones eyes to process a single counter than process
> > >    heaps of traces.
> > > 
> > > I think a minimal set of RCU counters exposed to /proc or /sys should not
> > > hurt and could do more good than not. The scheduler already does this for
> > > scheduler statistics. I have seen Peter complain a lot about new tracepoints
> > > but not much (or never) about new statistics.
> > > 
> > > Tracing has its strengths but may not apply well here IMO. I think a counter
> > > like this could be useful for tuning of things like the jiffies_*_sched_qs,
> > > the stall timeouts and also any other RCU knobs. What do you think?
> > 
> > Is this one of those cases where eBPF is the answer, regardless of
> > the question?  ;-)
> 
> It could be. Except that people who fancy busybox still could benefit from
> the counter ;-)

;-)

Another approach might be for RCU to dump statistics, including this
one, to the console every so often, for example, maybe every minute for
the first hour, every hour for the first day, and every day after that.
What else might work?

(I am not a huge fan of bringing back RCU's debugfs due to testability
concerns and due to the fact that very few people ever used it.)

Plus the busybox people could always add a trace_printk() or similar.

> And also, eBPF uses RCU itself heavily, so it would help if the debug related
> counter itself didn't depend on it. ;-)

I would think that eBPF printing a statistical counter from RCU wouldn't
pose danger even given that eBPF uses RCU, so I suspect that your point
about busybox carries more force in this argument.  ;-)

							Thanx, Paul