linux-kernel - Re: [PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding mflr with -mprofile-kernel

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-Id: <1560939496.ovo51ph4i4.astroid@bobo.none>
Date:   Wed, 19 Jun 2019 20:41:41 +1000
From:   Nicholas Piggin <npiggin@...il.com>
To:     Masami Hiramatsu <mhiramat@...nel.org>,
        Ingo Molnar <mingo@...nel.org>,
        Michael Ellerman <mpe@...erman.id.au>,
        "Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com>,
        Steven Rostedt <rostedt@...dmis.org>
Cc:     linux-kernel@...r.kernel.org, linuxppc-dev@...ts.ozlabs.org
Subject: Re: [PATCH 4/7] powerpc/ftrace: Additionally nop out the preceding
 mflr with -mprofile-kernel

Naveen N. Rao's on June 19, 2019 7:53 pm:
> Nicholas Piggin wrote:
>> Michael Ellerman's on June 19, 2019 3:14 pm:
>>> Hi Naveen,
>>> 
>>> Sorry I meant to reply to this earlier .. :/
> 
> No problem. Thanks for the questions.
> 
>>> 
>>> "Naveen N. Rao" <naveen.n.rao@...ux.vnet.ibm.com> writes:
>>>> With -mprofile-kernel, gcc emits 'mflr r0', followed by 'bl _mcount' to
>>>> enable function tracing and profiling. So far, with dynamic ftrace, we
>>>> used to only patch out the branch to _mcount(). However, mflr is
>>>> executed by the branch unit that can only execute one per cycle on
>>>> POWER9 and shared with branches, so it would be nice to avoid it where
>>>> possible.
>>>>
>>>> We cannot simply nop out the mflr either. When enabling function
>>>> tracing, there can be a race if tracing is enabled when some thread was
>>>> interrupted after executing a nop'ed out mflr. In this case, the thread
>>>> would execute the now-patched-in branch to _mcount() without having
>>>> executed the preceding mflr.
>>>>
>>>> To solve this, we now enable function tracing in 2 steps: patch in the
>>>> mflr instruction, use synchronize_rcu_tasks() to ensure all existing
>>>> threads make progress, and then patch in the branch to _mcount(). We
>>>> override ftrace_replace_code() with a powerpc64 variant for this
>>>> purpose.
>>> 
>>> According to the ISA we're not allowed to patch mflr at runtime. See the
>>> section on "CMODX".
>> 
>> According to "quasi patch class" engineering note, we can patch
>> anything with a preferred nop. But that's written as an optional
>> facility, which we don't have a feature to test for.
>> 
> 
> Hmm... I wonder what the implications are. We've been patching in a 
> 'trap' for kprobes for a long time now, along with having to patch back 
> the original instruction (which can be anything), when the probe is 
> removed.

Will have to check what implementations support "quasi patch class"
instructions. IIRC recent POWER processors are okay. May have to add
a feature test though.

>>> 
>>> I'm also not convinced the ordering between the two patches is
>>> guaranteed by the ISA, given that there's possibly no isync on the other
>>> CPU.
>> 
>> Will they go through a context synchronizing event?
>> 
>> synchronize_rcu_tasks() should ensure a thread is scheduled away, but
>> I'm not actually sure it guarantees CSI if it's kernel->kernel. Could
>> do a smp_call_function to do the isync on each CPU to be sure.
> 
> Good point. Per 
> Documentation/RCU/Design/Requirements/Requirements.html#Tasks RCU:
> "The solution, in the form of Tasks RCU, is to have implicit read-side 
> critical sections that are delimited by voluntary context switches, that 
> is, calls to schedule(), cond_resched(), and synchronize_rcu_tasks(). In 
> addition, transitions to and from userspace execution also delimit 
> tasks-RCU read-side critical sections."
> 
> I suppose transitions to/from userspace, as well as calls to schedule() 
> result in context synchronizing instruction being executed. But, if some 
> tasks call cond_resched() and synchronize_rcu_tasks(), we probably won't 
> have a CSI executed.
> 
> Also:
> "In CONFIG_PREEMPT=n kernels, trampolines cannot be preempted, so these 
> APIs map to call_rcu(), synchronize_rcu(), and rcu_barrier(), 
> respectively."
> 
> In this scenario as well, I think we won't have a CSI executed in case 
> of cond_resched().
> 
> Should we enhance patch_instruction() to handle that?

Well, not sure. Do we have many post-boot callers of it? Should
they take care of their own synchronization requirements?

Thanks,
Nick