linux-kernel - Re: [PATCH v6 2/2] tracing: Add support for preempt and irq enable/disable events

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAJWu+ooMB7pDu0xAkqwZBgBO67CFW6A4AXa3hbKo-oSh93t5=Q@mail.gmail.com>
Date:   Mon, 25 Sep 2017 17:23:00 -0700
From:   Joel Fernandes <joelaf@...gle.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     LKML <linux-kernel@...r.kernel.org>, kernel-team@...roid.com,
        Steven Rostedt <rostedt@...dmis.org>
Subject: Re: [PATCH v6 2/2] tracing: Add support for preempt and irq
 enable/disable events

Hi Peter,

On Mon, Sep 25, 2017 at 1:34 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Fri, Sep 22, 2017 at 11:28:30AM -0700, Joel Fernandes wrote:
[...]
>> >>  void start_critical_timings(void)
>> >>  {
>> >> +     if (this_cpu_read(tracing_preempt_cpu))
>> >> +             trace_preempt_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>> >> +
>> >> +     if (this_cpu_read(tracing_irq_cpu))
>> >> +             trace_irq_enable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>> >> +
>> >>       start_critical_timings_tracer();
>> >>  }
>> >>  EXPORT_SYMBOL_GPL(start_critical_timings);
>> >>
>> >>  void stop_critical_timings(void)
>> >>  {
>> >> +     if (this_cpu_read(tracing_preempt_cpu))
>> >> +             trace_preempt_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>> >> +
>> >> +     if (this_cpu_read(tracing_irq_cpu))
>> >> +             trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
>> >> +
>> >>       stop_critical_timings_tracer();
>> >>  }
>> >>  EXPORT_SYMBOL_GPL(stop_critical_timings);
>> >
>> > And I feel these yield somewhat odd semantics, does that need explaining
>> > somewhere?
>>
>> Maybe I can add a comment here, if you prefer that. When you meant
>> semantics, do you mean 'critical' vs 'atomic' thing or do you mean the
>> semantics/context of how this function is supposed to be used?
>
> I would add the comment to the tracepoint definition.
>
> On semantics, the whole stop/start excludes a fair bunch of walltime
> from our measurement, I feel that needs to be called out and enumerated
> (when time goes missing and why).
>
> Given that the idle thread runs with preempt-off I understand its
> purpose from the POV from the preempt-tracer, but its 'weird' behaviour
> if you're looking at it from a pure tracepoint pov.

I see your point, basically you have a problem with the "fake preempt
on" event that I'm generating when start_critical_timings is called
from the idle code path which isn't really a preempt-on per-se but is
an effect of entering the idle path.

Since we already have cpuidle events, one thing I could do is just
drop out this part of the patch set and not generate any trace events
during start_critical_timings, and one could use the preempt_disable
trace event (introduced in this path) and the cpuidle trace event to
figure out that part of the preempt-disabled section is actual the cpu
idle path. Would that alleviate your concern here?

As a next step Steven discussed with me in the hallway at LPC that we
could possibly a start_critical_timings event as well, so probably as
a next step after this patch, those new events can be added which
gives more visibility into walltime that shouldn't be counted. Or I
could add it in this series as a separate patch, let me know if that
makes sense and what you'd like to do, thanks.

>> >>  void trace_preempt_off(unsigned long a0, unsigned long a1)
>> >>  {
>> >> +     if (this_cpu_read(tracing_preempt_cpu))
>> >> +             return;
>> >> +
>> >> +     this_cpu_write(tracing_preempt_cpu, 1);
>> >> +
>> >> +     trace_preempt_disable_rcuidle(a0, a1);
>> >>       tracer_preempt_off(a0, a1);
>> >>  }
>> >>  #endif
>> >
>> > And here you assume things like double on / double off don't happen,
>> > which might well be so, but does seem somewhat fragile.
>> >
>>
>> We are handling the cases where these functions might be called twice,
>> but we are only interested in the first time they're called. I caught
>> a dead lock happen when I didn't add such protection to
>> trace_hardirqs_off so I added to these to the trace_hardirqs* and
>> trace_preempt* ones as well to just to be extra safe and keep it
>> consistent. Hope I understood your concern correctly, if not please
>> let me know, thanks.
>
> Under what conditions where they called twice? That seems like something
> that should not happen to begin with. Esp the one I left quoted above,

Ok, so the calling twice AIUI cannot happen with trace_preempt_on and
trace_preempt_off, I agree and I'll drop the unnecessary per-cpu
variable check, but...

> disabling when its already disabled sounds like fail. So please provide
> more details on the scenario you're working around.

It can still happen with local_irq_save called while interrupts are disabled:

The trace_hardirqs_off API can be called even when IRQs are already
off. This is unlike the trace_hardirqs_on which checks if IRQs are off
(atleast from some callsites), here are the definitions just for
reference [1]. I guess we could modify local_irq_disable and
local_irq_save to check what the HW flags was before calling
raw_local_irq_save and only then call trace_hardirqs_off if they were
indeed on and now being turned off, but that adds complexity to it -
also we have to then modify all the callsites from assembly code to
conditionally call trace_hardirqs_on/off :(.

Instead my patch uses a per-CPU variable in trace_hardirqs_* function
to track that interrupts were already off when it was called thus
avoiding uselessly emitting a trace event:

void trace_hardirqs_off(void)
{
   if (this_cpu_read(tracing_irq_cpu))
       return;

   this_cpu_write(tracing_irq_cpu, 1);

   trace_irq_disable_rcuidle(CALLER_ADDR0, CALLER_ADDR1);
   tracer_hardirqs_off();
}

I feel this is the cleanest path forward for the IRQ disable/enable
trace events.

Are you Ok with this?

thanks,

- Joel

[1] http://elixir.free-electrons.com/linux/latest/source/include/linux/irqflags.h#L89