[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CABgGipWu2sL0EiKGUgksCZta5Ru3N5fqiNZVQ-5mrDt00puyyQ@mail.gmail.com>
Date: Wed, 19 Jun 2024 11:40:10 +0800
From: Andy Chiu <andy.chiu@...ive.com>
To: Alexandre Ghiti <alexghiti@...osinc.com>
Cc: Alexandre Ghiti <alex@...ti.fr>, Conor Dooley <conor.dooley@...rochip.com>,
Paul Walmsley <paul.walmsley@...ive.com>, Palmer Dabbelt <palmer@...belt.com>,
Albert Ou <aou@...s.berkeley.edu>, Steven Rostedt <rostedt@...dmis.org>,
Masami Hiramatsu <mhiramat@...nel.org>, Mark Rutland <mark.rutland@....com>,
Andrea Parri <parri.andrea@...il.com>, Björn Töpel <bjorn@...osinc.com>,
linux-riscv@...ts.infradead.org, linux-kernel@...r.kernel.org,
linux-trace-kernel@...r.kernel.org, puranjay Mohan <puranjay12@...il.com>
Subject: Re: [PATCH] riscv: Fix early ftrace nop patching
On Tue, Jun 18, 2024 at 9:40 PM Alexandre Ghiti <alexghiti@...osinc.com> wrote:
>
> Hi Andy,
>
> On Tue, Jun 18, 2024 at 2:48 PM Andy Chiu <andy.chiu@...ive.com> wrote:
> >
> > On Tue, Jun 18, 2024 at 8:02 PM Alexandre Ghiti <alex@...ti.fr> wrote:
> > >
> > > Hi Conor,
> > >
> > > On 17/06/2024 15:23, Alexandre Ghiti wrote:
> > > > Hi Conor,
> > > >
> > > > Sorry for the delay here.
> > > >
> > > > On 13/06/2024 09:48, Conor Dooley wrote:
> > > >> On Thu, May 23, 2024 at 01:51:34PM +0200, Alexandre Ghiti wrote:
> > > >>> Commit c97bf629963e ("riscv: Fix text patching when IPI are used")
> > > >>> converted ftrace_make_nop() to use patch_insn_write() which does not
> > > >>> emit any icache flush relying entirely on __ftrace_modify_code() to do
> > > >>> that.
> > > >>>
> > > >>> But we missed that ftrace_make_nop() was called very early directly
> > > >>> when
> > > >>> converting mcount calls into nops (actually on riscv it converts 2B
> > > >>> nops
> > > >>> emitted by the compiler into 4B nops).
> > > >>>
> > > >>> This caused crashes on multiple HW as reported by Conor and Björn since
> > > >>> the booting core could have half-patched instructions in its icache
> > > >>> which would trigger an illegal instruction trap: fix this by emitting a
> > > >>> local flush icache when early patching nops.
> > > >>>
> > > >>> Fixes: c97bf629963e ("riscv: Fix text patching when IPI are used")
> > > >>> Signed-off-by: Alexandre Ghiti <alexghiti@...osinc.com>
> > > >>> ---
> > > >>> arch/riscv/include/asm/cacheflush.h | 6 ++++++
> > > >>> arch/riscv/kernel/ftrace.c | 3 +++
> > > >>> 2 files changed, 9 insertions(+)
> > > >>>
> > > >>> diff --git a/arch/riscv/include/asm/cacheflush.h
> > > >>> b/arch/riscv/include/asm/cacheflush.h
> > > >>> index dd8d07146116..ce79c558a4c8 100644
> > > >>> --- a/arch/riscv/include/asm/cacheflush.h
> > > >>> +++ b/arch/riscv/include/asm/cacheflush.h
> > > >>> @@ -13,6 +13,12 @@ static inline void local_flush_icache_all(void)
> > > >>> asm volatile ("fence.i" ::: "memory");
> > > >>> }
> > > >>> +static inline void local_flush_icache_range(unsigned long start,
> > > >>> + unsigned long end)
> > > >>> +{
> > > >>> + local_flush_icache_all();
> > > >>> +}
> > > >>> +
> > > >>> #define PG_dcache_clean PG_arch_1
> > > >>> static inline void flush_dcache_folio(struct folio *folio)
> > > >>> diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > > >>> index 4f4987a6d83d..32e7c401dfb4 100644
> > > >>> --- a/arch/riscv/kernel/ftrace.c
> > > >>> +++ b/arch/riscv/kernel/ftrace.c
> > > >>> @@ -120,6 +120,9 @@ int ftrace_init_nop(struct module *mod, struct
> > > >>> dyn_ftrace *rec)
> > > >>> out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > > >>> mutex_unlock(&text_mutex);
> > > >> So, turns out that this patch is not sufficient. I've seen some more
> > > >> crashes, seemingly due to initialising nops on this mutex_unlock().
> > > >> Palmer suggested moving the if (!mod) ... into the lock, which fixed
> > > >> the problem for me.
> > > >
> > > >
> > > > Ok, it makes sense, I completely missed that. I'll send a fix for that
> > > > shortly so that it can be merged in rc5.
> > > >
> > > > Thanks,
> > > >
> > > > Alex
> > >
> > >
> > > So I digged a bit more and I'm afraid that the only way to make sure
> > > this issue does not happen elsewhere is to flush the icache right after
> > > the patching. We actually can't wait to batch the icache flush since
> > > along the way, we may call a function that has just been patched (the
> > > issue that you encountered here).
> >
> > Trying to provide my thoughts, please let me know if I missed
> > anything. I think what Conor suggested is safe for init_nop, as it
> > would be called only when there is only one core (booting) and at the
> > loading stage of kernel modules. In the first case we just have to
> > make sure there is no patchable entry before the core executes
> > fence.i. The second case is unconditionally safe because there is no
> > read-side of the race.
> >
> > It is a bit tricky for the generic (runtime) case of ftrace code
> > patching, but that is not because of the batch fence.i maintenance. As
> > long as there exists a patchable entry for the stopping thread, it is
> > possible for them to execute on a partially patched instruction. A
> > solution for this is again to prevent any patchable entry in the
> > stop_machine's stopping thread. Another solution is to apply the
> > atomic ftrace patching series which aims to get rid of the race.
>
> Yeah but my worry is that we can't make sure that functions called
> between the patching and the fence.i are not the ones that just get
> patched. At least today, patch_insn_write() etc should be marked as
IIUC, the compiler does not generate a patchable entry for
patch_insn_write, and so do all functions in patch.c. The entire file
is not compiled with patchable entry because the flag is removed in
riscv's Makefile. Please let me know if I misunderstand it.
> noinstr. But even then, we call core functions that could be patchable
> (I went through all and it *seems* we are safe *for now*, but this is
> very fragile).
Yes, functions in the "else" clause, atomic_read() etc, are safe for
now. However, it is impossible to fix as long as there exists a
patchable entry along the way. This is the problem that we cannot
patch 2 instructions with a concurrent read-side. On the other hand,
the path of ftrace_modify_all_code may experience the batch fence.i
maintenance issue, and can be fixed via a local fence.i. This is
because the path is executed by only one core. There does not exist a
concurrent write-side. As a result, we just need to make sure it
leaves each patch-site with a local fence.i to make sure code is
up-to-date.
>
> Then what do you think we should do to fix Conor's issue right now:
> should we mark the riscv specific functions as noinstr, cross fingers
> that the core functions are not (and don't become) patchable and wait
> for your atomic patching? Or should we flush as soon as possible as I
> proposed above (which to me fixes the issue but hurts performance)?
Either way works for me. IMHO, the fix should include:
1. move fence.i before mutex_unlock in init_nop
2. do local fence.i in __ftrace_modify_call
3. make sure the else clause does not happen to have a patchable entry
Thanks,
Andy
>
> Thanks,
>
> Alex
>
> >
> > >
> > > I don't know how much it will impact the performance but I guess it will.
> > >
> > > Unless someone has a better idea (I added Andy and Puranjay in cc), here
> > > is the patch that implements this, can you give it a try? Thanks
> > >
> > > diff --git a/arch/riscv/kernel/ftrace.c b/arch/riscv/kernel/ftrace.c
> > > index 87cbd86576b2..4b95c574fd04 100644
> > > --- a/arch/riscv/kernel/ftrace.c
> > > +++ b/arch/riscv/kernel/ftrace.c
> > > @@ -120,9 +120,6 @@ int ftrace_init_nop(struct module *mod, struct
> > > dyn_ftrace *rec)
> > > out = ftrace_make_nop(mod, rec, MCOUNT_ADDR);
> > > mutex_unlock(&text_mutex);
> > >
> > > - if (!mod)
> > > - local_flush_icache_range(rec->ip, rec->ip +
> > > MCOUNT_INSN_SIZE);
> > > -
> > > return out;
> > > }
> > >
> > > @@ -156,9 +153,9 @@ static int __ftrace_modify_code(void *data)
> > > } else {
> > > while (atomic_read(¶m->cpu_count) <= num_online_cpus())
> > > cpu_relax();
> > > - }
> > >
> > > - local_flush_icache_all();
> > > + local_flush_icache_all();
> > > + }
> > >
> > > return 0;
> > > }
> > > diff --git a/arch/riscv/kernel/patch.c b/arch/riscv/kernel/patch.c
> > > index 4007563fb607..ab03732d06c4 100644
> > > --- a/arch/riscv/kernel/patch.c
> > > +++ b/arch/riscv/kernel/patch.c
> > > @@ -89,6 +89,14 @@ static int __patch_insn_set(void *addr, u8 c, size_t len)
> > >
> > > memset(waddr, c, len);
> > >
> > > + /*
> > > + * We could have just patched a function that is about to be
> > > + * called so make sure we don't execute partially patched
> > > + * instructions by flushing the icache as soon as possible.
> > > + */
> > > + local_flush_icache_range((unsigned long)waddr,
> > > + (unsigned long)waddr + len);
> > > +
> > > patch_unmap(FIX_TEXT_POKE0);
> > >
> > > if (across_pages)
> > > @@ -135,6 +143,14 @@ static int __patch_insn_write(void *addr, const
> > > void *insn, size_t len)
> > >
> > > ret = copy_to_kernel_nofault(waddr, insn, len);
> > >
> > > + /*
> > > + * We could have just patched a function that is about to be
> > > + * called so make sure we don't execute partially patched
> > > + * instructions by flushing the icache as soon as possible.
> > > + */
> > > + local_flush_icache_range((unsigned long)waddr,
> > > + (unsigned long)waddr + len);
> > > +
> > > patch_unmap(FIX_TEXT_POKE0);
> > >
> > > if (across_pages)
> > > @@ -189,9 +205,6 @@ int patch_text_set_nosync(void *addr, u8 c, size_t len)
> > >
> > > ret = patch_insn_set(tp, c, len);
> > >
> > > - if (!ret)
> > > - flush_icache_range((uintptr_t)tp, (uintptr_t)tp + len);
> > > -
> > > return ret;
> > > }
> > > NOKPROBE_SYMBOL(patch_text_set_nosync);
> > > @@ -224,9 +237,6 @@ int patch_text_nosync(void *addr, const void *insns,
> > > size_t len)
> > >
> > > ret = patch_insn_write(tp, insns, len);
> > >
> > > - if (!ret)
> > > - flush_icache_range((uintptr_t) tp, (uintptr_t) tp + len);
> > > -
> > > return ret;
> > > }
> > > NOKPROBE_SYMBOL(patch_text_nosync);
> > > @@ -253,9 +263,9 @@ static int patch_text_cb(void *data)
> > > } else {
> > > while (atomic_read(&patch->cpu_count) <= num_online_cpus())
> > > cpu_relax();
> > > - }
> > >
> > > - local_flush_icache_all();
> > > + local_flush_icache_all();
> > > + }
> > >
> > > return ret;
> > > }
> > >
> > >
> > > >
> > > >
> > > >>
> > > >>> + if (!mod)
> > > >>> + local_flush_icache_range(rec->ip, rec->ip + MCOUNT_INSN_SIZE);
> > > >>> +
> > > >>> return out;
> > > >>> }
> > > >>> --
> > > >>> 2.39.2
> > > >>>
> > > >>>
> > > >>> _______________________________________________
> > > >>> linux-riscv mailing list
> > > >>> linux-riscv@...ts.infradead.org
> > > >>> http://lists.infradead.org/mailman/listinfo/linux-riscv
> > > >
> > > > _______________________________________________
> > > > linux-riscv mailing list
> > > > linux-riscv@...ts.infradead.org
> > > > http://lists.infradead.org/mailman/listinfo/linux-riscv
> >
> > Cheers,
> > Andy
Powered by blists - more mailing lists