linux-kernel - Re: [PATCH v6 2/2] tracing: Add support for preempt and irq enable/disable events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20170925061528.09082957@vmware.local.home>
Date:   Mon, 25 Sep 2017 06:15:28 -0400
From:   Steven Rostedt <rostedt@...dmis.org>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Joel Fernandes <joelaf@...gle.com>,
        LKML <linux-kernel@...r.kernel.org>, kernel-team@...roid.com
Subject: Re: [PATCH v6 2/2] tracing: Add support for preempt and irq
 enable/disable events

On Mon, 25 Sep 2017 10:34:32 +0200
Peter Zijlstra <peterz@...radead.org> wrote:

> > 
> >             grep-1041  [002] d..1    80.363455: preempt_disable:
> > caller=_raw_spin_lock_irqsave+0x1d/0x40
> > parent=add_wait_queue+0x15/0x50  
> 
> I suppose that sort-of makes sense for the preempt-tracer, but its a
> weird thing for a generic tracepoint.

I still find it very useful, even as a tracepoint.

> 
> > >>  void start_critical_timings(void)
> > >>  {
> > >> +     if (this_cpu_read(tracing_preempt_cpu))
> > >> +             trace_preempt_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> > >> +
> > >> +     if (this_cpu_read(tracing_irq_cpu))
> > >> +             trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> > >> +
> > >>       start_critical_timings_tracer();
> > >>  }
> > >>  EXPORT_SYMBOL_GPL(start_critical_timings);
> > >>
> > >>  void stop_critical_timings(void)
> > >>  {
> > >> +     if (this_cpu_read(tracing_preempt_cpu))
> > >> +             trace_preempt_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> > >> +
> > >> +     if (this_cpu_read(tracing_irq_cpu))
> > >> +             trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
> > >> +
> > >>       stop_critical_timings_tracer();
> > >>  }
> > >>  EXPORT_SYMBOL_GPL(stop_critical_timings);  
> > >
> > > And I feel these yield somewhat odd semantics, does that need explaining
> > > somewhere?  
> > 
> > Maybe I can add a comment here, if you prefer that. When you meant
> > semantics, do you mean 'critical' vs 'atomic' thing or do you mean the
> > semantics/context of how this function is supposed to be used?  
> 
> I would add the comment to the tracepoint definition.
> 
> On semantics, the whole stop/start excludes a fair bunch of walltime
> from our measurement, I feel that needs to be called out and enumerated
> (when time goes missing and why).
> 
> Given that the idle thread runs with preempt-off I understand its
> purpose from the POV from the preempt-tracer, but its 'weird' behaviour
> if you're looking at it from a pure tracepoint pov.

You mean you want to trace all calls to preempt and irq off even if
preempt and irqs are already off?


> 
> > >>  void trace_preempt_off(unsigned long a0, unsigned long a1)
> > >>  {
> > >> +     if (this_cpu_read(tracing_preempt_cpu))
> > >> +             return;
> > >> +
> > >> +     this_cpu_write(tracing_preempt_cpu, 1);
> > >> +
> > >> +     trace_preempt_disable_rcuidle(a0, a1);
> > >>       tracer_preempt_off(a0, a1);
> > >>  }
> > >>  #endif  
> > >
> > > And here you assume things like double on / double off don't happen,
> > > which might well be so, but does seem somewhat fragile.
> > >  
> > 
> > We are handling the cases where these functions might be called twice,
> > but we are only interested in the first time they're called. I caught
> > a dead lock happen when I didn't add such protection to
> > trace_hardirqs_off so I added to these to the trace_hardirqs* and
> > trace_preempt* ones as well to just to be extra safe and keep it
> > consistent. Hope I understood your concern correctly, if not please
> > let me know, thanks.  
> 
> Under what conditions where they called twice? That seems like something
> that should not happen to begin with. Esp the one I left quoted above,
> disabling when its already disabled sounds like fail. So please provide
> more details on the scenario you're working around.

I'm a little confused by this too.

-- Steve