[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Y5h+PX6a1a9yjQPp@alley>
Date: Tue, 13 Dec 2022 14:29:33 +0100
From: Petr Mladek <pmladek@...e.com>
To: Song Liu <song@...nel.org>
Cc: live-patching@...r.kernel.org, linux-kernel@...r.kernel.org,
jpoimboe@...nel.org, jikos@...nel.org, mbenes@...e.cz,
x86@...nel.org, joe.lawrence@...hat.com,
linuxppc-dev@...ts.ozlabs.org,
Josh Poimboeuf <jpoimboe@...hat.com>,
Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
Michael Ellerman <mpe@...erman.id.au>
Subject: Re: powerpc-part: was: Re: [PATCH v6] livepatch: Clear relocation
targets on a module removal
On Tue 2022-12-13 00:13:46, Song Liu wrote:
> )() ()On Mon, Dec 12, 2022 at 9:12 AM Petr Mladek <pmladek@...e.com> wrote:
> >
> > On Fri 2022-12-09 11:59:35, Song Liu wrote:
> > > On Fri, Dec 9, 2022 at 3:41 AM Petr Mladek <pmladek@...e.com> wrote:
> > > > On Mon 2022-11-28 17:57:06, Song Liu wrote:
> > > > > On Fri, Nov 18, 2022 at 8:24 AM Petr Mladek <pmladek@...e.com> wrote:
> > > > > >
> > > > > > > --- a/arch/powerpc/kernel/module_64.c
> > > > > > > +++ b/arch/powerpc/kernel/module_64.c
> > > > > > > +#ifdef CONFIG_LIVEPATCH
> > > > > > > +void clear_relocate_add(Elf64_Shdr *sechdrs,
> > > > > > > + const char *strtab,
> > > > > > > + unsigned int symindex,
> > > > > > > + unsigned int relsec,
> > > > > > > + struct module *me)
> > > > > > > +{
> >
> > [...]
> >
> > > > > > > +
> > > > > > > + instruction = (u32 *)location;
> > > > > > > + if (is_mprofile_ftrace_call(symname))
> > > > > > > + continue;
> > > >
> > > > Why do we ignore these symbols?
> > > >
> > > > I can't find any counter-part in apply_relocate_add(). It looks super
> > > > tricky. It would deserve a comment.
> > > >
> > > > And I have no idea how we could maintain these exceptions.
> > > >
> > > > > > > + if (!instr_is_relative_link_branch(ppc_inst(*instruction)))
> > > > > > > + continue;
> > > >
> > > > Same here. It looks super tricky and there is no explanation.
> > >
> > > The two checks are from restore_r2(). But I cannot really remember
> > > why we needed them. It is probably an updated version from an earlier
> > > version (3 year earlier..).
> >
> > This is a good sign that it has to be explained in a comment.
> > Or even better, it should not by copy pasted.
> >
> > > > > > > + instruction += 1;
> > > > > > > + patch_instruction(instruction, ppc_inst(PPC_RAW_NOP()));
> >
> > I believe that this is not enough. apply_relocate_add() does this:
> >
> > int apply_relocate_add(Elf64_Shdr *sechdrs,
> > [...]
> > struct module *me)
> > {
> > [...]
> > case R_PPC_REL24:
> > /* FIXME: Handle weak symbols here --RR */
> > if (sym->st_shndx == SHN_UNDEF ||
> > sym->st_shndx == SHN_LIVEPATCH) {
> > [...]
> > if (!restore_r2(strtab + sym->st_name,
> > (u32 *)location + 1, me))
> > [...] return -ENOEXEC;
> >
> > ---> if (patch_instruction((u32 *)location, ppc_inst(value)))
> > return -EFAULT;
> >
> > , where restore_r2() does:
> >
> > static int restore_r2(const char *name, u32 *instruction, struct module *me)
> > {
> > [...]
> > /* ld r2,R2_STACK_OFFSET(r1) */
> > ---> if (patch_instruction(instruction, ppc_inst(PPC_INST_LD_TOC)))
> > return 0;
> > [...]
> > }
> >
> > By other words, apply_relocate_add() modifies two instructions:
> >
> > + patch_instruction() called in restore_r2() writes into "location + 1"
> > + patch_instruction() called in apply_relocate_add() writes into "location"
> >
> > IMHO, we have to clear both.
> >
> > IMHO, we need to implement a function that reverts the changes done
> > in restore_r2(). Also we need to revert the changes done in
> > apply_relocate_add().
>
> I finally got time to read all the details again and recalled what
> happened with the code.
>
> The failure happens when we
> 1) call apply_relocate_add() on klp load (or module first load,
> if klp was loaded first);
> 2) do nothing when the module is unloaded;
> 3) call apply_relocate_add() on module reload, which failed.
>
> The failure happens at this check in restore_r2():
>
> if (*instruction != PPC_RAW_NOP()) {
> pr_err("%s: Expected nop after call, got %08x at %pS\n",
> me->name, *instruction, instruction);
> return 0;
> }
>
> Therefore, apply_relocate_add only fails when "location + 1"
> is not NOP. And to make it not fail, we only need to write NOP to
> "location + 1" in clear_relocate_add().
Yes, this should be enough to pass the existing check.
> IIUC, you want clear_relocate_add() to undo everything we did
> in apply_relocate_add(); while I was writing clear_relocate_add()
> to make the next apply_relocate_add() not fail.
>
> I agree that, based on the name, clear_relocate_add() should
> undo everything by apply_relocate_add(). But I am not sure how
> to handle some cases. For example, how do we undo
>
> case R_PPC64_ADDR32:
> /* Simply set it */
> *(u32 *)location = value;
> break;
>
> Shall we just write zeros? I don't think this matters.
I guess that it would be zeros as we do in x86_64.
> I think this is the question we should answer first:
> What shall clear_relocate_add() do?
> 1) undo everything by apply_relocate_add();
> 2) only do things needed to make the next
> apply_relocate_add succeed;
> 3) something between 1) and 2).
Good question.
Hmm, the commit a443bf6e8a7674b86221f49 ("powerpc/modules: Add REL24
relocation support of livepatch symbols") suggests that all symbols
in the section SHN_LIVEPATCH have the type R_PPC_REL24. AFAIK, the
kernel livepatches are the only user of the clear_relocate_add()
feature.
If the above is correct then it might be enough to clear only
R_PPC_REL24 type. And it might be enough to warn when clear_relocate_add()
is called for another type so that we know when the relocations
were not cleared properly.
Good question. We might need some input from people familiar
with the architecture and creating the livepatches.
Best Regards,
Petr
Powered by blists - more mailing lists