linux-kernel - Re: [PATCH v12 3/3] tracing: Centralize preemptirq tracepoints and unify their usage

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Date:   Tue, 7 Aug 2018 20:44:25 -0700
From:   Joel Fernandes <joelaf@...gle.com>
To:     Steven Rostedt <rostedt@...dmis.org>
Cc:     Joel Fernandes <joel@...lfernandes.org>,
        LKML <linux-kernel@...r.kernel.org>,
        "Cc: Android Kernel" <kernel-team@...roid.com>,
        Boqun Feng <boqun.feng@...il.com>,
        Byungchul Park <byungchul.park@....com>,
        Ingo Molnar <mingo@...hat.com>,
        Masami Hiramatsu <mhiramat@...nel.org>,
        Mathieu Desnoyers <mathieu.desnoyers@...icios.com>,
        Namhyung Kim <namhyung@...nel.org>,
        Paul McKenney <paulmck@...ux.vnet.ibm.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Thomas Glexiner <tglx@...utronix.de>,
        Tom Zanussi <tom.zanussi@...ux.intel.com>
Subject: Re: [PATCH v12 3/3] tracing: Centralize preemptirq tracepoints and
 unify their usage

Hi Steve,

On Tue, Aug 7, 2018 at 7:28 PM, Steven Rostedt <rostedt@...dmis.org> wrote:
> On Tue, 7 Aug 2018 19:13:32 -0700
> Joel Fernandes <joelaf@...gle.com> wrote:
>> >
>> >> From 6986af946ceb04fc9ddc6d5b45fc559b6807e465 Mon Sep 17 00:00:00 2001
>> >> From: "Joel Fernandes (Google)" <joel@...lfernandes.org>
>> >> Date: Sun, 5 Aug 2018 20:17:41 -0700
>> >> Subject: [PATCH] tracepoint: Run tracepoints even after CPU is offline
>> >>
>> >> Commit f37755490fe9 ("tracepoints: Do not trace when cpu is offline")
>> >> causes a problem for lockdep using tracepoint code. Once a CPU is
>> >> offline, tracepoints donot get called, however this causes a big problem
>> >> for lockdep probes that need to run so that IRQ annotations are marked
>> >> correctly.
>> >>
>> >> A race is possible where while the CPU is going offline, an interrupt
>> >> can come in and then a lockdep assert causes an annotation warning:
>> >>
>> >> [  106.551354] IRQs not enabled as expected
>> >> [  106.551785] WARNING: CPU: 1 PID: 0 at kernel/time/tick-sched.c:982
>> >>                                          tick_nohz_idle_enter+0x99/0xb0
>> >> [  106.552964] Modules linked in:
>> >> [  106.553299] CPU: 1 PID: 0 Comm: swapper/1 Tainted: G        W
>> >>
>> >> We need tracepoints to run as late as possible. This commit fixes the
>> >> issue by removing the cpu_online check in tracepoint code that was
>> >> introduced in the mentioned commit, however we now switch to using SRCU
>> >> for all tracepoints and special handle calling tracepoints from NMI so
>> >> that we don't run into issues that result from using sched-RCU when the
>> >> CPUs are marked to be offline.
>> >>
>> >> Fixes: c3bc8fd637a9 ("tracing: Centralize preemptirq tracepoints and
>> >>                       unify their usage")
>> >> Reported-by: Masami Hiramatsu <mhiramat@...nel.org>
>> >> Signed-off-by: Joel Fernandes (Google) <joel@...lfernandes.org>
>> >
>> >
>> > The above change log doesn't look like it matches the NMI patch.
>> >
>> > Can you resend with just the NMI changes? I already handled the cpu
>> > offline ones.
>>
>> Ok, sent with "cpu offline" changes dropped, here it is:
>> https://lore.kernel.org/patchwork/patch/972657/
>>
>> If you could add your Reported-by to it, that would be great as well.
>>
>> >
>> > But I may still have concerns with this patch.
>>
>> Ok let me know what you think.
>>
>
> Not sure you saw this part of my reply:

I missed the part on the srcu handles, sorry.

>
>> @@ -171,8 +174,7 @@ extern void syscall_unregfunc(void);
>>                       } while ((++it_func_ptr)->func);                \
>>               }                                                       \
>>                                                                       \
>> -             if (rcuidle)                                            \
>> -                     srcu_read_unlock_notrace(&tracepoint_srcu, idx);\
>> +             srcu_read_unlock_notrace(ss, idx);                      \
>
> Hmm, why do we have the two different srcu handles?

Because if the memory operations happening on the normal SRCU handle
(during srcu_read_lock) is interrupted by NMI, then the other handle
(devoted to NMI) could be used instead and not bother the interrupted
handle. Does that makes sense?

When I talked to Paul few months ago about SRCU from NMI context, he
mentioned the per-cpu memory operations during srcu_read_lock can be
NMI interrupted, that's why we added that warning.

> Thinking about this, if this can be called by the "thunk" code, then
> there might be an issue. I think the "thunk" code can be called before
> in_nmi() is set, so we don't know if we are in an NMI or not. I need to
> look at that code to make sure. If in_nmi() still works, then we should
> use the _nmi srcu handle (if that's required).
>
> But I'm not sure using SRCU for all tracepoints is needed at this
> moment. I'll have to look deeper into this tomorrow. But it's getting
> close to the merge window, and this needs to be settled quick. Another
> "partial revert" may be needed until this gets settled.

I did read this part, yes I'm not sure either. You mentioned you would
confirm that in the morning, I look forward to it. I hope the in_nmi()
function is reliable to use from here.