lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXF3N89J-Y8OptZjXPxnOLW+K5Ot+jCSBT+py90uvBxh1w@mail.gmail.com>
Date:   Wed, 10 May 2023 10:15:51 +0200
From:   Ard Biesheuvel <ardb@...nel.org>
To:     Hou Wenlong <houwenlong.hwl@...group.com>
Cc:     linux-kernel@...r.kernel.org,
        Lai Jiangshan <jiangshan.ljs@...group.com>,
        Kees Cook <keescook@...omium.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
        Dave Hansen <dave.hansen@...ux.intel.com>, x86@...nel.org,
        "H. Peter Anvin" <hpa@...or.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Petr Mladek <pmladek@...e.com>,
        Greg Kroah-Hartman <gregkh@...uxfoundation.org>,
        "Jason A. Donenfeld" <Jason@...c4.com>, Song Liu <song@...nel.org>,
        Julian Pidancet <julian.pidancet@...cle.com>
Subject: Re: [PATCH RFC 31/43] x86/modules: Adapt module loading for PIE support

On Wed, 10 May 2023 at 09:15, Hou Wenlong <houwenlong.hwl@...group.com> wrote:
>
> On Mon, May 08, 2023 at 05:16:34PM +0800, Ard Biesheuvel wrote:
> > On Mon, 8 May 2023 at 10:38, Hou Wenlong <houwenlong.hwl@...group.com> wrote:
> > >
> > > On Sat, Apr 29, 2023 at 03:29:32AM +0800, Ard Biesheuvel wrote:
> > > > On Fri, 28 Apr 2023 at 10:53, Hou Wenlong <houwenlong.hwl@...group.com> wrote:
> > > > >
> > > > > Adapt module loading to support PIE relocations. No GOT is generared for
> > > > > module, all the GOT entry of got references in module should exist in
> > > > > kernel GOT.  Currently, there is only one usable got reference for
> > > > > __fentry__().
> > > > >
> > > >
> > > > I don't think this is the right approach. We should permit GOTPCREL
> > > > relocations properly, which means making them point to a location in
> > > > memory that carries the absolute address of the symbol. There are
> > > > several ways to go about that, but perhaps the simplest way is to make
> > > > the symbol address in ksymtab a 64-bit absolute value (but retain the
> > > > PC32 references for the symbol name and the symbol namespace name).
> > > > That way, you can always resolve such GOTPCREL relocations by pointing
> > > > it to the ksymtab entry. Another option would be to take inspiration
> > > > from the PLT code we have on ARM and arm64 (and other architectures,
> > > > surely) and to count the GOT based relocations, allocate some extra
> > > > r/o module space for each, and allocate slots and populate them with
> > > > the right value as you fix up the relocations.
> > > >
> > > > Then, many such relocations can be relaxed at module load time if the
> > > > symbol is in range. IIUC, the module and kernel will still be inside
> > > > the same 2G window even after widening the KASLR range to 512G, so
> > > > most GOT loads can be converted into RIP relative LEA instructions.
> > > >
> > > > Note that this will also permit you to do things like
> > > >
> > > > #define PV_VCPU_PREEMPTED_ASM \
> > > >  "leaq __per_cpu_offset(%rip), %rax \n\t" \
> > > >  "movq (%rax,%rdi,8), %rax \n\t" \
> > > >  "addq steal_time@...PCREL(%rip), %rax \n\t" \
> > > >  "cmpb $0, " __stringify(KVM_STEAL_TIME_preempted) "(%rax) \n\t" \
> > > >  "setne %al\n\t"
> > > >
> > > > or
> > > >
> > > > +#ifdef CONFIG_X86_PIE
> > > > + " pushq arch_rethook_trampoline@...PCREL(%rip)\n"
> > > > +#else
> > > > " pushq $arch_rethook_trampoline\n"
> > > > +#endif
> > > >
> > > > instead of having these kludgy push/pop sequences to free up temp registers.
> > > >
> > > > (FYI I have looked into this PIE linking just a few weeks ago [0] so
> > > > this is all rather fresh in my memory)
> > > >
> > > >
> > > >
> > > >
> > > > [0] https://git.kernel.org/pub/scm/linux/kernel/git/ardb/linux.git/log/?h=x86-pie
> > > >
> > > >
> > > Hi Ard,
> > > Thanks for providing the link, it has been very helpful for me as I am
> > > new to the topic of compilers.
> >
> > Happy to hear that.
> >
> > > One key difference I noticed is that you
> > > linked the kernel with "-pie" instead of "--emit-reloc". I also noticed
> > > that Thomas' initial patchset[0] used "-pie", but in RFC v3 [1], it
> > > switched to "--emit-reloc" in order to reduce dynamic relocation space
> > > on mapped memory.
> > >
> >
> > The problem with --emit-relocs is that the relocations emitted into
> > the binary may get out of sync with the actual code after the linker
> > has applied relocations.
> >
> > $ cat /tmp/a.s
> > foo:movq foo@...PCREL(%rip), %rax
> >
> > $ x86_64-linux-gnu-gcc -c -o /tmp/a.o /tmp/a.s
> > ard@...bale:~/linux$ x86_64-linux-gnu-objdump -dr /tmp/a.o
> >
> > /tmp/a.o:     file format elf64-x86-64
> >
> >
> > Disassembly of section .text:
> >
> > 0000000000000000 <foo>:
> >    0: 48 8b 05 00 00 00 00 mov    0x0(%rip),%rax        # 7 <foo+0x7>
> > 3: R_X86_64_REX_GOTPCRELX foo-0x4
> >
> > $ x86_64-linux-gnu-gcc -c -o /tmp/a.o /tmp/a.s
> > $ x86_64-linux-gnu-objdump -dr /tmp/a.o
> > 0000000000000000 <foo>:
> >    0: 48 8b 05 00 00 00 00 mov    0x0(%rip),%rax        # 7 <foo+0x7>
> > 3: R_X86_64_REX_GOTPCRELX foo-0x4
> >
> > $ x86_64-linux-gnu-gcc -o /tmp/a.elf -nostartfiles
> > -Wl,-no-pie,-q,--defsym,_start=0x0 /tmp/a.s
> > $ x86_64-linux-gnu-objdump -dr /tmp/a.elf
> > 0000000000401000 <foo>:
> >   401000: 48 c7 c0 00 10 40 00 mov    $0x401000,%rax
> > 401003: R_X86_64_32S foo
> >
> > $ x86_64-linux-gnu-gcc -o /tmp/a.elf -nostartfiles
> > -Wl,-q,--defsym,_start=0x0 /tmp/a.s
> > $ x86_64-linux-gnu-objdump -dr /tmp/a.elf
> > 0000000000001000 <foo>:
> >     1000: 48 8d 05 f9 ff ff ff lea    -0x7(%rip),%rax        # 1000 <foo>
> > 1003: R_X86_64_PC32 foo-0x4
> >
> > This all looks as expected. However, when using Clang, we end up with
> >
> > $ clang -target x86_64-linux-gnu -o /tmp/a.elf -nostartfiles
> > -fuse-ld=lld -Wl,--relax,-q,--defsym,_start=0x0 /tmp/a.s
> > $ x86_64-linux-gnu-objdump -dr /tmp/a.elf
> > 00000000000012c0 <foo>:
> >     12c0: 48 8d 05 f9 ff ff ff lea    -0x7(%rip),%rax        # 12c0 <foo>
> > 12c3: R_X86_64_REX_GOTPCRELX foo-0x4
> >
> > So in this case, what --emit-relocs gives us is not what is actually
> > in the binary. We cannot just ignore these either, given that they are
> > treated differently depending on whether the symbol is a per-CPU
> > symbol or not - in the former case, we need to perform a fixup if the
> > relaxed reference is RIP relative, and in the latter case, if the
> > relaxed reference is absolute.
> >
> > On top of that, --emit-relocs does not cover the GOT, so we'd still
> > need to process that from the code explicitly.
> >
> > In general, relying on --emit-relocs is kind of dodgy, and I think
> > combining PIE linking with --emit-relocs is a bad idea.
> >
> > > The another issue is that it requires the addition of the
> > > "-mrelax-relocations=no" option to support older compilers and linkers.
> >
> > Why? The decompressor is now linked in PIE mode so we should be able
> > to drop that. Or do you need to add is somewhere else?
> >
> Hi Ard,
>
> After removing the "-mrelax-relocations=no" option, I noticed that the
> linker was relaxing GOT references as absolute references for mov
> instructions, even if the symbol was in a high address, as long as I
> kept the compile-time base address of the kernel image in the top 2G. I
> consulted the "Optimize GOTPCRELX Relocations" chapter in x86-64 psABI,
> which stated that "When position-independent code is disabled and foo is
> defined locally in the lower 32-bit address space, memory operand in mov
> can be converted into immediate operand". However, it seemed that if the
> symbol was in the higher 32-bit address space, the memory operand in mov
> would also be converted into an immediate operand. If I decreased the
> compile-time base address of the kernel image, it would be relaxed as
> lea. Therefore, I believe that using "-mrelax-relocations=no" without
> "-pie" option is necessary.

Indeed. As you noted, the linker assumes that non-PIE linked binaries
will always appear at their link time address, and relaxations will
try to take advantage of that.

Currently, we use -pie linking only for the decompressor, and we
should be able to drop -mrelax-relocations=no from its LDFLAGS. But
position dependent linking should not use relaxations at all.

> Is there a way to force the linker to relax
> it as lea without using the "-pie" option when linking?
>

Not that I am aware of.

> Since all GOT references cannot be omitted, perhaps I should try linking
> the kernel with the "-pie" option.
>

That way, we will end up with two sets of relocations, the static ones
from --emit-relocs and the dynamic ones from -pie. This should be
manageable, given that the difference between those sets should
exactly cover the GOT.

However, relying on --emit-relocs and -pie at the same time seems
clumsy to me. I'd prefer to only depend on -pie at /some/ point.

-- 
Ard.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ