lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAFbqK8kWUdoEStTRe+a4mk9dMf4W4vk7aZhT_uTqS6jgAqYJ0A@mail.gmail.com>
Date:   Wed, 21 Aug 2019 19:32:17 +0100
From:   Neil MacLeod <neil@...cleod.com>
To:     John Hubbard <jhubbard@...dia.com>
Cc:     "H. Peter Anvin" <hpa@...or.com>, Ingo Molnar <mingo@...hat.com>,
        Borislav Petkov <bp@...en8.de>,
        "x86@...nel.org" <x86@...nel.org>,
        LKML <linux-kernel@...r.kernel.org>, gregkh@...uxfoundation.org
Subject: Re: Boot failure due to: x86/boot: Save fields explicitly, zero out
 everything else

Hi John

I can test any patches given a link to the diff, and am happy to do so.

If I've understood your suggestion correctly, I will try commenting
out all of the entries, then add them back one-by-one until I get a
non-booting situation. I'll let you know how I get on.

The OS I'm testing is LibreELEC, which is a custom x86_64 build, and
this hasn't shown any problems on this Skylake i5 NUC in all the years
I've been using it as a test system (since at least 4.15.y). So far
5.3-rc has been trouble-free until 5.3-rc5. As I mentioned in the bug,
I've been able to boot 5.3-rc5 from a USB memory stick, but can't boot
5.3-rc5 from the internal M2 drive unless I revert this commit.

Regards
Neil

On Wed, 21 Aug 2019 at 19:20, John Hubbard <jhubbard@...dia.com> wrote:
>
> On 8/21/19 10:05 AM, Neil MacLeod wrote:
> > Hi John
> >
> > The following bug might be of interest:
> >
> > https://bugzilla.kernel.org/show_bug.cgi?id=204645
> >
> > Let me know if I can be of any help.
> >
>
> Hi Neil,
>
> First of all, I'm pasting in the bug information so that it's directly available
> here:
>
> ===================================================================
> Description: Neil MacLeod 2019-08-21 16:29:19 UTC
> System: Intel i5 Skylake NUC (NUC6i5SYH)
>
> This system boots fine from internal M2 (128GB) drive with 5.3-rc4.
>
> With 5.3-rc5, it does not boot from M2 and is stuck on the Intel splash screen (no other text is displayed, no panic etc.). It will boot 5.3-rc5 from a USB flash memory stick (via the F10 boot menu), but not from the internal M2.
>
> Bisecting between 5.3-rc4 and 5.3-rc5, the bad commit is:
>
> neil@...linux:~/projects/pullrequest_repos/torvalds-linux$ git bisect bad
> a90118c445cc7f07781de26a9684d4ec58bfcfd1 is the first bad commit
> commit a90118c445cc7f07781de26a9684d4ec58bfcfd1
> Author: John Hubbard <jhubbard@...dia.com>
> Date:   Tue Jul 30 22:46:27 2019 -0700
>
>      x86/boot: Save fields explicitly, zero out everything else
>
>      Recent gcc compilers (gcc 9.1) generate warnings about an out of bounds
>      memset, if the memset goes accross several fields of a struct. This
>      generated a couple of warnings on x86_64 builds in sanitize_boot_params().
>
>      Fix this by explicitly saving the fields in struct boot_params
>      that are intended to be preserved, and zeroing all the rest.
>
>      [ tglx: Tagged for stable as it breaks the warning free build there as well ]
>
>      Suggested-by: Thomas Gleixner <tglx@...utronix.de>
>      Suggested-by: H. Peter Anvin <hpa@...or.com>
>      Signed-off-by: John Hubbard <jhubbard@...dia.com>
>      Signed-off-by: Thomas Gleixner <tglx@...utronix.de>
>      Cc: stable@...r.kernel.org
>      Link: https://lkml.kernel.org/r/20190731054627.5627-2-jhubbard@nvidia.com
>
> :040000 040000 e0963edca990540dd759765a3d765af4698df892 d07e645eb3a500c31bd65526205e286ff6941187 M      arch
>
> Comment 1 Neil MacLeod 2019-08-21 16:31:35 UTC
> The kernel is built with gcc-9.2.0.
>
> Comment 2 Neil MacLeod 2019-08-21 16:55:48 UTC
>
> 5.3-rc5 with "x86/boot: Save fields explicitly, zero out everything else" reverted will build with gcc-9.2.0, and boot from M2.
> ===================================================================
>
> I'm also CC'ing the lists, so they know that the patch has caused a regression, and
> also out of hope that they can help me choose the shortest path forward to debugging
> this. My first reaction is that the list of BOOT_PARAM_PRESERVE() fields is
> flawed--by which I include cases of  "the system improperly relied on a field that the spec said
> should be zeroed". (After all, the boot_params->sentinel is set, which already means
> the system is not really doing it right.)
>
> So I'm going to cheat and ask right now if anyone notices either ommissions
> or wrong entries here:
>
> static void sanitize_boot_params(struct boot_params *boot_params)
> {
> ...
>                 const struct boot_params_to_save to_save[] = {
>                         BOOT_PARAM_PRESERVE(screen_info),
>                         BOOT_PARAM_PRESERVE(apm_bios_info),
>                         BOOT_PARAM_PRESERVE(tboot_addr),
>                         BOOT_PARAM_PRESERVE(ist_info),
>                         BOOT_PARAM_PRESERVE(acpi_rsdp_addr),
>                         BOOT_PARAM_PRESERVE(hd0_info),
>                         BOOT_PARAM_PRESERVE(hd1_info),
>                         BOOT_PARAM_PRESERVE(sys_desc_table),
>                         BOOT_PARAM_PRESERVE(olpc_ofw_header),
>                         BOOT_PARAM_PRESERVE(efi_info),
>                         BOOT_PARAM_PRESERVE(alt_mem_k),
>                         BOOT_PARAM_PRESERVE(scratch),
>                         BOOT_PARAM_PRESERVE(e820_entries),
>                         BOOT_PARAM_PRESERVE(eddbuf_entries),
>                         BOOT_PARAM_PRESERVE(edd_mbr_sig_buf_entries),
>                         BOOT_PARAM_PRESERVE(edd_mbr_sig_buffer),
>                         BOOT_PARAM_PRESERVE(e820_table),
>                         BOOT_PARAM_PRESERVE(eddbuf),
>                 };
>
> If not, then I think we need to bisect by...well, let's start with the
> theory that we zeroed out too much, so we could start adding more fields
> to preserve.  If that doesn't find the problem, then it's more complicated,
> and might be better to go the other direction: starting without my commit,
> and zeroing out fields until we see the failure.
>
> Are you able to test patches? It would take some time, since there are quite a
> few fields. Alternately, if you provide some more system information details
> (especially if we have any other notes about other failures and passes), then
> I might be able to borrow a Skylake system and attempt a repro.
>
>
> thanks,
> --
> John Hubbard
> NVIDIA

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ