[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPhsuW7y1GzT8+quk4vJEqM6SagqDqc=HXA3jtdmfTfC=Gsv-Q@mail.gmail.com>
Date: Tue, 13 Dec 2022 00:13:46 -0800
From: Song Liu <song@...nel.org>
To: Petr Mladek <pmladek@...e.com>
Cc: live-patching@...r.kernel.org, linux-kernel@...r.kernel.org,
jpoimboe@...nel.org, jikos@...nel.org, mbenes@...e.cz,
x86@...nel.org, joe.lawrence@...hat.com,
linuxppc-dev@...ts.ozlabs.org,
Josh Poimboeuf <jpoimboe@...hat.com>,
Kamalesh Babulal <kamalesh@...ux.vnet.ibm.com>,
Michael Ellerman <mpe@...erman.id.au>
Subject: Re: powerpc-part: was: Re: [PATCH v6] livepatch: Clear relocation
targets on a module removal
)() ()On Mon, Dec 12, 2022 at 9:12 AM Petr Mladek <pmladek@...e.com> wrote:
>
> On Fri 2022-12-09 11:59:35, Song Liu wrote:
> > On Fri, Dec 9, 2022 at 3:41 AM Petr Mladek <pmladek@...e.com> wrote:
> > > On Mon 2022-11-28 17:57:06, Song Liu wrote:
> > > > On Fri, Nov 18, 2022 at 8:24 AM Petr Mladek <pmladek@...e.com> wrote:
> > > > >
> > > > > > --- a/arch/powerpc/kernel/module_64.c
> > > > > > +++ b/arch/powerpc/kernel/module_64.c
> > > > > > +#ifdef CONFIG_LIVEPATCH
> > > > > > +void clear_relocate_add(Elf64_Shdr *sechdrs,
> > > > > > + const char *strtab,
> > > > > > + unsigned int symindex,
> > > > > > + unsigned int relsec,
> > > > > > + struct module *me)
> > > > > > +{
>
> [...]
>
> > > > > > +
> > > > > > + instruction = (u32 *)location;
> > > > > > + if (is_mprofile_ftrace_call(symname))
> > > > > > + continue;
> > >
> > > Why do we ignore these symbols?
> > >
> > > I can't find any counter-part in apply_relocate_add(). It looks super
> > > tricky. It would deserve a comment.
> > >
> > > And I have no idea how we could maintain these exceptions.
> > >
> > > > > > + if (!instr_is_relative_link_branch(ppc_inst(*instruction)))
> > > > > > + continue;
> > >
> > > Same here. It looks super tricky and there is no explanation.
> >
> > The two checks are from restore_r2(). But I cannot really remember
> > why we needed them. It is probably an updated version from an earlier
> > version (3 year earlier..).
>
> This is a good sign that it has to be explained in a comment.
> Or even better, it should not by copy pasted.
>
> > > > > > + instruction += 1;
> > > > > > + patch_instruction(instruction, ppc_inst(PPC_RAW_NOP()));
>
> I believe that this is not enough. apply_relocate_add() does this:
>
> int apply_relocate_add(Elf64_Shdr *sechdrs,
> [...]
> struct module *me)
> {
> [...]
> case R_PPC_REL24:
> /* FIXME: Handle weak symbols here --RR */
> if (sym->st_shndx == SHN_UNDEF ||
> sym->st_shndx == SHN_LIVEPATCH) {
> [...]
> if (!restore_r2(strtab + sym->st_name,
> (u32 *)location + 1, me))
> [...] return -ENOEXEC;
>
> ---> if (patch_instruction((u32 *)location, ppc_inst(value)))
> return -EFAULT;
>
> , where restore_r2() does:
>
> static int restore_r2(const char *name, u32 *instruction, struct module *me)
> {
> [...]
> /* ld r2,R2_STACK_OFFSET(r1) */
> ---> if (patch_instruction(instruction, ppc_inst(PPC_INST_LD_TOC)))
> return 0;
> [...]
> }
>
> By other words, apply_relocate_add() modifies two instructions:
>
> + patch_instruction() called in restore_r2() writes into "location + 1"
> + patch_instruction() called in apply_relocate_add() writes into "location"
>
> IMHO, we have to clear both.
>
> IMHO, we need to implement a function that reverts the changes done
> in restore_r2(). Also we need to revert the changes done in
> apply_relocate_add().
I finally got time to read all the details again and recalled what
happened with the code.
The failure happens when we
1) call apply_relocate_add() on klp load (or module first load,
if klp was loaded first);
2) do nothing when the module is unloaded;
3) call apply_relocate_add() on module reload, which failed.
The failure happens at this check in restore_r2():
if (*instruction != PPC_RAW_NOP()) {
pr_err("%s: Expected nop after call, got %08x at %pS\n",
me->name, *instruction, instruction);
return 0;
}
Therefore, apply_relocate_add only fails when "location + 1"
is not NOP. And to make it not fail, we only need to write NOP to
"location + 1" in clear_relocate_add().
IIUC, you want clear_relocate_add() to undo everything we did
in apply_relocate_add(); while I was writing clear_relocate_add()
to make the next apply_relocate_add() not fail.
I agree that, based on the name, clear_relocate_add() should
undo everything by apply_relocate_add(). But I am not sure how
to handle some cases. For example, how do we undo
case R_PPC64_ADDR32:
/* Simply set it */
*(u32 *)location = value;
break;
Shall we just write zeros? I don't think this matters.
I think this is the question we should answer first:
What shall clear_relocate_add() do?
1) undo everything by apply_relocate_add();
2) only do things needed to make the next
apply_relocate_add succeed;
3) something between 1) and 2).
WDYT?
Thanks,
Song
>
> Please, avoid code duplication as much as possible. Especially,
> the two checks is_mprofile_ftrace_call() and
> instr_is_relative_link_branch() must be shared. IMHO, it is
> the only way to keep the code maintainable. We must make sure that
> we will clear the instructions only when they were patched. And
> copy pasting various tricky exceptions is a way to hell.
>
>
> > > int update_relocate_add(Elf64_Shdr *sechdrs,
> > > const char *strtab,
> > > unsigned int symindex,
> > > unsigned int relsec,
> > > struct module *me,
> > > bool apply)
> > > {
> > > unsigned int i;
> > > Elf64_Rela *rela = (void *)sechdrs[relsec].sh_addr;
> > > Elf64_Sym *sym;
> > > Elf64_Xword r_type;
> > > unsigned long *location;
> > >
> > > if (apply) {
> > > pr_debug("Applying ADD relocate section %u to %u\n", relsec,
> > > sechdrs[relsec].sh_info);
> > > } else {
> > > pr_debug("Clearing ADD relocate section %u\n", relsec");
> > > }
> > >
> > > for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rela); i++) {
> > > /* This is where to make the change */
> > > location = (void *)sechdrs[sechdrs[relsec].sh_info].sh_addr
> > > + rela[i].r_offset;
> > > /* This is the symbol it is referring to */
> > > sym = (Elf64_Sym *)sechdrs[symindex].sh_addr
> > > + ELF64_R_SYM(rela[i].r_info);
> > >
> > > r_type = ELF64_R_TYPE(rela[i].r_info);
> > >
> > > if (apply) {
> > > apply_relocate_location(sym, location, r_type, rela[i].r_addend);
> > > } else {
> > > clear_relocate_location(sym, location, r_type);
> > > }
> >
> > I personally don't like too many "if (apply) {...} else {...}" patterns in
> > a function. And these new functions confuse me sometimes:
> >
> > update_relocate_add(..., apply);
> > apply_relocate_location();
> > clear_relocate_location();
>
> Feel free to come up with another way how to avoid code duplication.
>
> > And I did think there wasn't too much duplicated code.
>
> I think that it looks very different when you are writing or reading
> or mantainting the code. It might be easier to write code and modify
> it. It is more complicated to find the differences later. Also it is
> more complicated to do the same changes many times when the common
> code is updated later.
>
> Best Regards,
> Petr
Powered by blists - more mailing lists