[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <22fc7555-19a3-497f-914f-1ec102450fd2@ghiti.fr>
Date: Mon, 24 Jun 2024 10:08:30 +0200
From: Alexandre Ghiti <alex@...ti.fr>
To: Andy Chiu <andy.chiu@...ive.com>, Alexandre Ghiti <alexghiti@...osinc.com>
Cc: Conor Dooley <conor.dooley@...rochip.com>,
Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt
<palmer@...belt.com>, Albert Ou <aou@...s.berkeley.edu>,
Steven Rostedt <rostedt@...dmis.org>, Masami Hiramatsu
<mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>,
Andrea Parri <parri.andrea@...il.com>, Björn Töpel
<bjorn@...osinc.com>, linux-riscv@...ts.infradead.org,
linux-kernel@...r.kernel.org, linux-trace-kernel@...r.kernel.org,
puranjay Mohan <puranjay12@...il.com>
Subject: Re: [PATCH] riscv: Fix early ftrace nop patching
On 20/06/2024 19:03, Alexandre Ghiti wrote:
>
> On 19/06/2024 05:40, Andy Chiu wrote:
>> On Tue, Jun 18, 2024 at 9:40 PM Alexandre Ghiti
>> <alexghiti@...osinc.com> wrote:
>>> Hi Andy,
>>>
>>> On Tue, Jun 18, 2024 at 2:48 PM Andy Chiu <andy.chiu@...ive.com> wrote:
>>>> On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti <alex@...ti.fr> wrote:
>>>>> Hi Conor,
>>>>>
>>>>> On 17/06/2024 15:23, Alexandre Ghiti wrote:
>>>>>> Hi Conor,
>>>>>>
>>>>>> Sorry for the delay here.
>>>>>>
>>>>>> On 13/06/2024 09:48, Conor Dooley wrote:
>>>>>>> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
>>>>>>>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
>>>>>>>> converted ftrace_make_nop() to use patch_insn_write() which
>>>>>>>> does not
>>>>>>>> emit any icache flush relying entirely on
>>>>>>>> __ftrace_modify_code() to do
>>>>>>>> that.
>>>>>>>>
>>>>>>>> But we missed that ftrace_make_nop() was called very early
>>>>>>>> directly
>>>>>>>> when
>>>>>>>> converting mcount calls into nops (actually on riscv it
>>>>>>>> converts 2B
>>>>>>>> nops
>>>>>>>> emitted by the compiler into 4B nops).
>>>>>>>>
>>>>>>>> This caused crashes on multiple HW as reported by Conor and
>>>>>>>> Björn since
>>>>>>>> the booting core could have half-patched instructions in its
>>>>>>>> icache
>>>>>>>> which would trigger an illegal instruction trap: fix this by
>>>>>>>> emitting a
>>>>>>>> local flush icache when early patching nops.
>>>>>>>>
>>>>>>>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
>>>>>>>> Signed-off-by: Alexandre Ghiti <alexghiti@...osinc.com>
>>>>>>>> ---
>>>>>>>> arch/riscv/include/asm/cacheflush.h | 6 ++++++
>>>>>>>> arch/riscv/kernel/ftrace.c | 3 +++
>>>>>>>> 2 files changed, 9 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/arch/riscv/include/asm/cacheflush.h
>>>>>>>> b/arch/riscv/include/asm/cacheflush.h
>>>>>>>> index dd8d07146116..ce79c558a4c8 100644
>>>>>>>> --- a/arch/riscv/include/asm/cacheflush.h
>>>>>>>> +++ b/arch/riscv/include/asm/cacheflush.h
>>>>>>>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
>>>>>>>> asm volatile ("fence.i" ::: "memory");
>>>>>>>> }
>>>>>>>> +static inline void local_flush_icache_range(unsigned long
>>>>>>>> start,
>>>>>>>> + unsigned long end)
>>>>>>>> +{
>>>>>>>> + local_flush_icache_all();
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> #define PG_dcache_clean PG_arch_1
>>>>>>>> static inline void flush_dcache_folio(struct folio *folio)
>>>>>>>> diff --git a/arch/riscv/kernel/ftrace.c
>>>>>>>> b/arch/riscv/kernel/ftrace.c
>>>>>>>> index 4f4987a6d83d..32e7c401dfb4 100644
>>>>>>>> --- a/arch/riscv/kernel/ftrace.c
>>>>>>>> +++ b/arch/riscv/kernel/ftrace.c
>>>>>>>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct
>>>>>>>> dyn_ftrace *rec)
>>>>>>>> out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>>>>>>>> mutex_unlock(&text_mutex);
>>>>>>> So, turns out that this patch is not sufficient. I've seen some
>>>>>>> more
>>>>>>> crashes, seemingly due to initialising nops on this mutex_unlock().
>>>>>>> Palmer suggested moving the if (!mod) ... into the lock, which
>>>>>>> fixed
>>>>>>> the problem for me.
>>>>>>
>>>>>> Ok, it makes sense, I completely missed that. I'll send a fix for
>>>>>> that
>>>>>> shortly so that it can be merged in rc5.
>>>>>>
>>>>>> Thanks,
>>>>>>
>>>>>> Alex
>>>>>
>>>>> So I digged a bit more and I'm afraid that the only way to make sure
>>>>> this issue does not happen elsewhere is to flush the icache right
>>>>> after
>>>>> the patching. We actually can't wait to batch the icache flush since
>>>>> along the way, we may call a function that has just been patched (the
>>>>> issue that you encountered here).
>>>> Trying to provide my thoughts, please let me know if I missed
>>>> anything. I think what Conor suggested is safe for init_nop, as it
>>>> would be called only when there is only one core (booting) and at the
>>>> loading stage of kernel modules. In the first case we just have to
>>>> make sure there is no patchable entry before the core executes
>>>> fence.i. The second case is unconditionally safe because there is no
>>>> read-side of the race.
>>>>
>>>> It is a bit tricky for the generic (runtime) case of ftrace code
>>>> patching, but that is not because of the batch fence.i maintenance. As
>>>> long as there exists a patchable entry for the stopping thread, it is
>>>> possible for them to execute on a partially patched instruction. A
>>>> solution for this is again to prevent any patchable entry in the
>>>> stop_machine's stopping thread. Another solution is to apply the
>>>> atomic ftrace patching series which aims to get rid of the race.
>>> Yeah but my worry is that we can't make sure that functions called
>>> between the patching and the fence.i are not the ones that just get
>>> patched. At least today, patch_insn_write() etc should be marked as
>> IIUC, the compiler does not generate a patchable entry for
>> patch_insn_write, and so do all functions in patch.c. The entire file
>> is not compiled with patchable entry because the flag is removed in
>> riscv's Makefile. Please let me know if I misunderstand it.
>>
>>> noinstr. But even then, we call core functions that could be patchable
>>> (I went through all and it *seems* we are safe *for now*, but this is
>>> very fragile).
>> Yes, functions in the "else" clause, atomic_read() etc, are safe for
>> now. However, it is impossible to fix as long as there exists a
>> patchable entry along the way. This is the problem that we cannot
>> patch 2 instructions with a concurrent read-side. On the other hand,
>> the path of ftrace_modify_all_code may experience the batch fence.i
>> maintenance issue, and can be fixed via a local fence.i. This is
>> because the path is executed by only one core. There does not exist a
>> concurrent write-side. As a result, we just need to make sure it
>> leaves each patch-site with a local fence.i to make sure code is
>> up-to-date.
>>
>>> Then what do you think we should do to fix Conor's issue right now:
>>> should we mark the riscv specific functions as noinstr, cross fingers
>>> that the core functions are not (and don't become) patchable and wait
>>> for your atomic patching? Or should we flush as soon as possible as I
>>> proposed above (which to me fixes the issue but hurts performance)?
>> Either way works for me. IMHO, the fix should include:
>>
>> 1. move fence.i before mutex_unlock in init_nop
>> 2. do local fence.i in __ftrace_modify_call
>> 3. make sure the else clause does not happen to have a patchable entry
>
>
> OK I have just checked again and I agree with you on the whole fix :)
> I may add some comments for 3 so that it is very clear.
And after another check, a few other callsites need to emit a sfence.i
like ftrace_make_call() and ftrace_make_nop(), which in the end, amounts
to adding a sfence.i right after the patching is done. So after a
private discussion with Andy, we both agreed that the initial solution I
posted was the safest. Admittedly, the impact on performance is unknown,
so any input on this side would be appreciated.
@Conor: I'm sending this as a proper patch if you can give it a try so
that we can merge that in rc6.
Thanks,
Alex
>
> Thanks Andy!
>
> Alex
>
>
>>
>> Thanks,
>> Andy
>>
>>> Thanks,
>>>
>>> Alex
>>>
>>>>> I don't know how much it will impact the performance but I guess
>>>>> it will.
>>>>>
>>>>> Unless someone has a better idea (I added Andy and Puranjay in
>>>>> cc), here
>>>>> is the patch that implements this, can you give it a try? Thanks
>>>>>
>>>>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
>>>>> index 87cbd86576b2..4b95c574fd04 100644
>>>>> --- a/arch/riscv/kernel/ftrace.c
>>>>> +++ b/arch/riscv/kernel/ftrace.c
>>>>> @@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct
>>>>> dyn_ftrace *rec)
>>>>> out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
>>>>> mutex_unlock(&text_mutex);
>>>>>
>>>>> - if (!mod)
>>>>> - local_flush_icache_range(rec->ip, rec->ip +
>>>>> MCOUNT_INSN_SIZE);
>>>>> -
>>>>> return out;
>>>>> }
>>>>>
>>>>> @@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data)
>>>>> } else {
>>>>> while (atomic_read(¶m->cpu_count) <=
>>>>> num_online_cpus())
>>>>> cpu_relax();
>>>>> - }
>>>>>
>>>>> - local_flush_icache_all();
>>>>> + local_flush_icache_all();
>>>>> + }
>>>>>
>>>>> return 0;
>>>>> }
>>>>> diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
>>>>> index 4007563fb607..ab03732d06c4 100644
>>>>> --- a/arch/riscv/kernel/patch.c
>>>>> +++ b/arch/riscv/kernel/patch.c
>>>>> @@ -89,6 +89,14 @@ static int __patch_insn_set(void *addr, u8 c,
>>>>> size_t len)
>>>>>
>>>>> memset(waddr, c, len);
>>>>>
>>>>> + /*
>>>>> + * We could have just patched a function that is about to be
>>>>> + * called so make sure we don't execute partially patched
>>>>> + * instructions by flushing the icache as soon as possible.
>>>>> + */
>>>>> + local_flush_icache_range((unsigned long)waddr,
>>>>> + (unsigned long)waddr + len);
>>>>> +
>>>>> patch_unmap(FIX_TEXT_POKE0);
>>>>>
>>>>> if (across_pages)
>>>>> @@ -135,6 +143,14 @@ static int __patch_insn_write(void *addr, const
>>>>> void *insn, size_t len)
>>>>>
>>>>> ret = copy_to_kernel_nofault(waddr, insn, len);
>>>>>
>>>>> + /*
>>>>> + * We could have just patched a function that is about to be
>>>>> + * called so make sure we don't execute partially patched
>>>>> + * instructions by flushing the icache as soon as possible.
>>>>> + */
>>>>> + local_flush_icache_range((unsigned long)waddr,
>>>>> + (unsigned long)waddr + len);
>>>>> +
>>>>> patch_unmap(FIX_TEXT_POKE0);
>>>>>
>>>>> if (across_pages)
>>>>> @@ -189,9 +205,6 @@ int patch_text_set_nosync(void *addr, u8 c,
>>>>> size_t len)
>>>>>
>>>>> ret = patch_insn_set(tp, c, len);
>>>>>
>>>>> - if (!ret)
>>>>> - flush_icache_range((uintptr_t)tp, (uintptr_t)tp +
>>>>> len);
>>>>> -
>>>>> return ret;
>>>>> }
>>>>> NOKPROBE_SYMBOL(patch_text_set_nosync);
>>>>> @@ -224,9 +237,6 @@ int patch_text_nosync(void *addr, const void
>>>>> *insns,
>>>>> size_t len)
>>>>>
>>>>> ret = patch_insn_write(tp, insns, len);
>>>>>
>>>>> - if (!ret)
>>>>> - flush_icache_range((uintptr_t) tp, (uintptr_t) tp
>>>>> + len);
>>>>> -
>>>>> return ret;
>>>>> }
>>>>> NOKPROBE_SYMBOL(patch_text_nosync);
>>>>> @@ -253,9 +263,9 @@ static int patch_text_cb(void *data)
>>>>> } else {
>>>>> while (atomic_read(&patch->cpu_count) <=
>>>>> num_online_cpus())
>>>>> cpu_relax();
>>>>> - }
>>>>>
>>>>> - local_flush_icache_all();
>>>>> + local_flush_icache_all();
>>>>> + }
>>>>>
>>>>> return ret;
>>>>> }
>>>>>
>>>>>
>>>>>>
>>>>>>>> + if (!mod)
>>>>>>>> + local_flush_icache_range(rec->ip, rec->ip +
>>>>>>>> MCOUNT_INSN_SIZE);
>>>>>>>> +
>>>>>>>> return out;
>>>>>>>> }
>>>>>>>> --
>>>>>>>> 2.39.2
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> linux-riscv mailing list
>>>>>>>> linux-riscv@...ts.infradead.org
>>>>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>>>> _______________________________________________
>>>>>> linux-riscv mailing list
>>>>>> linux-riscv@...ts.infradead.org
>>>>>> http://lists.infradead.org/mailman/listinfo/linux-riscv
>>>> Cheers,
>>>> Andy
>> _______________________________________________
>> linux-riscv mailing list
>> linux-riscv@...ts.infradead.org
>> http://lists.infradead.org/mailman/listinfo/linux-riscv
Powered by blists - more mailing lists