[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPhsuW436=wRKLixWNtE9Rx=6A0gKrOCR8EUOdwTrPw5W6gddg@mail.gmail.com>
Date: Fri, 20 Jan 2023 11:41:02 -0800
From: Song Liu <song@...nel.org>
To: Josh Poimboeuf <jpoimboe@...nel.org>
Cc: linux-kernel@...r.kernel.org, linux-modules@...r.kernel.org,
live-patching@...r.kernel.org, x86@...nel.org, jikos@...nel.org,
pmladek@...e.com, joe.lawrence@...hat.com,
Miroslav Benes <mbenes@...e.cz>,
Josh Poimboeuf <jpoimboe@...hat.com>
Subject: Re: [PATCH v9] livepatch: Clear relocation targets on a module removal
On Fri, Jan 20, 2023 at 11:16 AM Josh Poimboeuf <jpoimboe@...nel.org> wrote:
>
> On Wed, Jan 18, 2023 at 12:47:28PM -0800, Song Liu wrote:
> > From: Miroslav Benes <mbenes@...e.cz>
> >
> > Josh reported a bug:
> >
> > When the object to be patched is a module, and that module is
> > rmmod'ed and reloaded, it fails to load with:
> >
> > module: x86/modules: Skipping invalid relocation target, existing value is nonzero for type 2, loc 00000000ba0302e9, val ffffffffa03e293c
> > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
> >
> > The livepatch module has a relocation which references a symbol
> > in the _previous_ loading of nfsd. When apply_relocate_add()
> > tries to replace the old relocation with a new one, it sees that
> > the previous one is nonzero and it errors out.
>
> Should we add a selftest to make sure this problem doesn't come back?
IIRC, a selftest for this issue is not easy without Joe's klp-convert work.
At the moment I use kpatch-build for testing.
>
> > On ppc64le, we have a similar issue:
> >
> > module_64: livepatch_nfsd: Expected nop after call, got e8410018 at e_show+0x60/0x548 [livepatch_nfsd]
> > livepatch: failed to initialize patch 'livepatch_nfsd' for module 'nfsd' (-8)
> > livepatch: patch 'livepatch_nfsd' failed for module 'nfsd', refusing to load module 'nfsd'
>
> Just to clarify my previous comment, the reference to the powerpc issue
> should be removed because it'll be fixed in a separate patch.
Will remove.
>
> > He also proposed three different solutions. We could remove the error
> > check in apply_relocate_add() introduced by commit eda9cec4c9a1
> > ("x86/module: Detect and skip invalid relocations"). However the check
> > is useful for detecting corrupted modules.
> >
> > We could also deny the patched modules to be removed. If it proved to be
> > a major drawback for users, we could still implement a different
> > approach. The solution would also complicate the existing code a lot.
> >
> > We thus decided to reverse the relocation patching (clear all relocation
> > targets on x86_64). The solution is not
> > universal and is too much arch-specific, but it may prove to be simpler
> > in the end.
> >
> > Signed-off-by: Miroslav Benes <mbenes@...e.cz>
> > Signed-off-by: Song Liu <song@...nel.org>
> > Acked-by: Miroslav Benes <mbenes@...e.cz>
> > Co-developed-by: Song Liu <song@...nel.org>
> > Reported-by: Josh Poimboeuf <jpoimboe@...hat.com>
>
> According to submitting-patches.rst the Co-developed-by is supposed to
> be immediately before your Signed-off-by.
>
> Also, other than the commit log, this patch is almost completely
> unrecognizable compared to Miroslav's original patch. Does it still
> make sense for him to be listed as the author?
>
> In the -tip tree they sometimes use an Originally-by tag, which might be
> relevant here.
How about:
Signed-off-by: Song Liu <song@...nel.org>
Originally-by: Miroslav Benes <mbenes@...e.cz>
Acked-by: Miroslav Benes <mbenes@...e.cz>
Reported-by: Josh Poimboeuf <jpoimboe@...hat.com>
>
> > -static int __apply_relocate_add(Elf64_Shdr *sechdrs,
> > +static int __write_relocate_add(Elf64_Shdr *sechdrs,
> > const char *strtab,
> > unsigned int symindex,
> > unsigned int relsec,
> > struct module *me,
> > - void *(*write)(void *dest, const void *src, size_t len))
> > + void *(*write)(void *dest, const void *src, size_t len),
> > + bool apply)
> > {
> > unsigned int i;
> > Elf64_Rela *rel = (void *)sechdrs[relsec].sh_addr;
> > Elf64_Sym *sym;
> > void *loc;
> > u64 val;
> > + u64 zero = 0ULL;
> >
> > - DEBUGP("Applying relocate section %u to %u\n",
> > + DEBUGP("%s relocate section %u to %u\n",
> > + (apply) ? "Applying" : "Clearing",
>
> "(apply)" has unnecessary parentheses.
>
> > relsec, sechdrs[relsec].sh_info);
> > for (i = 0; i < sechdrs[relsec].sh_size / sizeof(*rel); i++) {
> > + int write_size = 4;
>
> We already know we're writing, I suggest just calling it 'size' to be
> more consistent with kernel style.
>
> Also, it can be an unsigned value like size_t.
>
> Also, instead of initializing it here with a potentially bogus value
> which may need to be overwritten for a 64-bit reloc, it's better to
> explicitly set the size in each individual case below. That's makes the
> logic clearer and helps prevent future bugs if new 64-bit relocation
> types are added later.
Will fix in v10.
>
> > case R_X86_64_PC32:
> > case R_X86_64_PLT32:
> > - if (*(u32 *)loc != 0)
> > - goto invalid_relocation;
> > val -= (u64)loc;
> > - write(loc, &val, 4);
> > #if 0
> > - if ((s64)val != *(s32 *)loc)
> > + if ((s64)val != *(s32 *)&val)
> > goto overflow;
> > #endif
>
> Why is the compiled-out code getting changed? Is it actually fixing a
> hypothetical bug?
I just changed them the same way as other cases (assuming we remove
the #if 0, of course).
>
> This code really needs to be removed anyway, it's been dead for at least
> 15 years.
Shall we remove it now? Within the same patch? Or with a preparation
patch?
>
> > break;
> > case R_X86_64_PC64:
> > - if (*(u64 *)loc != 0)
> > - goto invalid_relocation;
> > val -= (u64)loc;
> > - write(loc, &val, 8);
> > + write_size = 8;
> > break;
> > default:
> > pr_err("%s: Unknown rela relocation: %llu\n",
> > me->name, ELF64_R_TYPE(rel[i].r_info));
> > return -ENOEXEC;
> > }
> > +
> > + if (apply) {
> > + if (memcmp(loc, &zero, write_size)) {
> > + pr_err("x86/modules: Skipping invalid relocation target, existing value is nonzero for type %d, loc %p, val %Lx\n",
>
> It's not just "skipping", it's erroring out completely. Yes, it's is a
> pre-existing error message but we might as well improve it while
> touching it.
>
> Maybe just remove the "Skipping", i.e. change "Skipping invalid ..." to
> "Invalid ..." ?
Sounds good to me.
>
> > + (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
> > + return -ENOEXEC;
> > + }
> > + write(loc, &val, write_size);
> > + } else {
> > + if (memcmp(loc, &val, write_size)) {
> > + pr_warn("x86/modules: Clearing invalid relocation target, existing value does not match expected value for type %d, loc %p, val %Lx\n",
> > + (int)ELF64_R_TYPE(rel[i].r_info), loc, val);
> > + }
> > + write(loc, &zero, write_size);
>
> If the value doesn't match then something has gone badly wrong. Why go
> ahead with the clearing in that case?
We can pr_err() then return -ENOEXEC (?). But I guess we need to
handle the error case in:
klp_cleanup_module_patches_limited()
klp_module_coming()
klp_module_going()
and all the functions that call klp_module_going().
This seems a big overkill to me...
Or do you mean we just skip the write()?
>
> > +#ifdef CONFIG_LIVEPATCH
> > +
> > +void clear_relocate_add(Elf64_Shdr *sechdrs,
> > + const char *strtab,
> > + unsigned int symindex,
> > + unsigned int relsec,
> > + struct module *me)
> > +{
> > + write_relocate_add(sechdrs, strtab, symindex, relsec, me, false);
> > +}
> > +#endif
>
> Superflous newline after the '#ifdef CONFIG_LIVEPATCH'.
>
> > +
> > #endif
> >
> > int module_finalize(const Elf_Ehdr *hdr,
> > diff --git a/include/linux/moduleloader.h b/include/linux/moduleloader.h
> > index 7b4587a19189..0b54ec9856df 100644
> > --- a/include/linux/moduleloader.h
> > +++ b/include/linux/moduleloader.h
> > @@ -75,6 +75,13 @@ int apply_relocate_add(Elf_Shdr *sechdrs,
> > unsigned int symindex,
> > unsigned int relsec,
> > struct module *mod);
> > +#ifdef CONFIG_LIVEPATCH
>
> This could use a comment explaining the purpose of this function:
>
> /*
> * Some architectures (namely x86_64 and ppc64) perform sanity checks when
> * applying relocations. If a patched module gets unloaded and then later
> * reloaded (and re-patched), klp re-applies relocations to the replacement
> * function(s). Any leftover relocations from the previous loading of the
> * patched module might trigger the sanity checks.
> *
> * To prevent that, when unloading a patched module, clear out any relocations
> * that might trigger arch-specific sanity checks on a future module reload.
> */
Sounds great. Will add this to the next version.
>
> > +void clear_relocate_add(Elf_Shdr *sechdrs,
> > + const char *strtab,
> > + unsigned int symindex,
> > + unsigned int relsec,
> > + struct module *me);
> > +#endif
>
>
> --
> Josh
Powered by blists - more mailing lists