lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 3 Jul 2018 14:00:25 -0700
From:   Andi Kleen <ak@...ux.intel.com>
To:     "Kirill A. Shutemov" <kirill@...temov.name>
Cc:     Gabriel C <nix.or.die@...il.com>,
        Benjamin Gilbert <bgilbert@...hat.com>,
        linux-x86_64@...r.kernel.org, LKML <linux-kernel@...r.kernel.org>,
        "Kirill A. Shutemov" <kirill.shutemov@...ux.intel.com>,
        Ingo Molnar <mingo@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        "H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
        bero@...dev.ch
Subject: Re: 4.17.x won't boot due to "x86/boot/compressed/64: Handle 5-level
 paging boot if kernel is above 4G"

On Tue, Jul 03, 2018 at 11:26:09PM +0300, Kirill A. Shutemov wrote:
> On Tue, Jul 03, 2018 at 11:03:07AM -0700, Andi Kleen wrote:
> > On Tue, Jul 03, 2018 at 05:21:50PM +0300, Kirill A. Shutemov wrote:
> > > On Tue, Jul 03, 2018 at 03:44:03PM +0300, Kirill A. Shutemov wrote:
> > > > On Tue, Jul 03, 2018 at 01:24:49PM +0200, Gabriel C wrote:
> > > > > 2018-07-01 23:32 GMT+02:00 Benjamin Gilbert <bgilbert@...hat.com>:
> > > > > > On Sun, Jul 01, 2018 at 05:15:59PM -0400, Benjamin Gilbert wrote:
> > > > > >> 4.17 kernels built with the CoreOS Container Linux toolchain and kconfig,
> > > > > >> up to and including 4.17.3, fail to boot on AMD64 running in (at least)
> > > > > >> QEMU/KVM.  No messages are shown post-GRUB; the VM instantly reboots.
> > > > > >> Reverting commit 194a9749c73d ("x86/boot/compressed/64: Handle 5-level
> > > > > >> paging boot if kernel is above 4G") fixes it.  I've attached our kernel
> > > > > >> config for reference, and am happy to test patches, provide sample QCOW
> > > > > >> images, etc.
> > > > > >
> > > > > 
> > > > > Also see https://bugzilla.kernel.org/show_bug.cgi?id=200385 ,
> > > > > 
> > > > > 0a1756bd2897951c03c1cb671bdfd40729ac2177 is acting up
> > > > > too with the same symptoms
> > > > 
> > > > I tracked it down to -flto in LDFLAGS. I'll look more into this.
> > > 
> > > -flto in LDFLAGS screws up this part of paging_prepare():
> > 
> > Where is that coming from? The LTO patches are not upstream.
> > 
> > And I don't see any LTO usage in the main line.
> 
> Apparently some distros try to hack it around:
> 
> https://bugzilla.kernel.org/show_bug.cgi?id=200385
> 
> I'm amazed that it kinda worked for them.

I think it only works on older gccs that don't default to 
thin LTO, but always generate a fallback non LTO object. 
The kernel directly uses ld in the link step (without my patches), so LTO
shouldn't be able to ever generate code.

The early boot code may be an exception of this, and it's likely
the only code that actually uses LTO in such a set up.

The -fPIC is actually scarier than the -flto. The generated code 
must create quite a mess and I'm not sure why you even would want that
because the kernel can be relocatable without it.

BTW I hope to eventually resend the full LTO patches.
They seem to get more and more users recently, mainly for smaller
code size.

> 
> 
> > > 	/* Copy trampoline code in place */
> > > 	memcpy(trampoline_32bit + TRAMPOLINE_32BIT_CODE_OFFSET / sizeof(unsigned long),
> > > 			&trampoline_32bit_src, TRAMPOLINE_32BIT_CODE_SIZE);
> > 
> > 
> > > In particular, relocation for trampoline_32bit_src solved in the wrong
> > > way. Without -flto, we have rip-realtive address load:
> > > 
> > >   982d30:	48 8d 35 09 cc ff ff 	lea    -0x33f7(%rip),%rsi        # 97f940 <trampoline_32bit_src>
> > > 
> > > With -flto we have immediate load:
> > > 
> > >   982cf0:	48 c7 c6 f0 f8 97 00 	mov    $0x97f8f0,%rsi
> > 
> > Strange.
> > 
> > Can you add some RELOC_HIDE()s and see if that helps?
> 
> Nope. No difference in generated code.

Ok will need to put together some self contained test case for the compiler people.
I'll try to take a look.

-Andi

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ