lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAMj1kXGjiA1HydMaY82MQsYvkchpN7v7CMOB5i3NEdqcYGn19Q@mail.gmail.com>
Date: Thu, 28 Nov 2024 09:52:33 +0100
From: Ard Biesheuvel <ardb@...nel.org>
To: Johan Hovold <johan@...nel.org>, Leif Lindholm <leif.lindholm@....qualcomm.com>
Cc: Bjorn Andersson <andersson@...nel.org>, Ricardo Salveti <ricardo@...ndries.io>, 
	Marc Zyngier <maz@...nel.org>, linux-efi@...r.kernel.org, linux-arm-msm@...r.kernel.org, 
	linux-kernel@...r.kernel.org
Subject: Re: UEFI EBS() failures on Lenovo T14s

Hi Johan,

Putting Leif on cc, although he is OoO and so it may take him a while
to respond.

On Thu, 28 Nov 2024 at 09:20, Johan Hovold <johan@...nel.org> wrote:
>
> Hi Ard,
>
> We've run into a buggy UEFI implementation on the Qualcomm Snapdragon
> based Lenovo ThinkPad T14s where ExitBootServices() often fails.
>
> One bootloader entry may fail to start almost consistently (once in a
> while it may start), while a second entry may always work even when the
> kernel, dtb and initramfs images are copies of the failing entry on the
> same ESP.
>
> This can be worked around by starting and exiting a UEFI shell from the
> bootloader or by starting the bootloader manually via the Boot Menu
> (F12) before starting the kernel.
>
> Notably starting the kernel automatically from the shell startup.nsh
> does not work, while calling the same script manually works.
>
> Based on your comments to a similar report for an older Snapdragon based
> Lenovo UEFI implementation [1], I discovered that allocating an event
> before calling ExitBootServices() can make the call succeed. There is
> often no need to actually signal the event group, but the event must
> remain allocated (i.e. CloseEvent() must not be called).
>
> (Raising TPL or disabling interrupts does not seem to help.)
>
> Also with the event signalling, ExitBootServices() sometimes fails when
> starting the kernel automatically from a shell startup.nsh, while
> systemd-boot seems to always work. This was only observed after removing
> some efi_printk() used during the experiments from the stub...
>
> Something is obviously really broken here, but do you have any
> suggestions about what could be the cause of this as further input to
> Qualcomm (and Lenovo) as they try to fix this?
>
> For completeness, the first call to ExitBootServices() often fails also
> on the x1e80100 reference design (CRD), and Qualcomm appears to have
> been the ones providing the current retry implementation:
>
>         fc07716ba803 ("efi/libstub: Introduce ExitBootServices helper")
>
> as this was needed to prevent similar boot failures with older Qualcomm
> UEFI fw.
>
> Marc is also hitting something like this on the Qualcomm X1E devkit
> (i.e. with firmware that should not have any modifications from Lenovo).
>

So the error code is EFI_INVALID_PARAMETER in all cases? In the
upstream implementation, the only thing that can make
ExitBootServices() return an error is a mismatch of the map key, and
so there is something changing the memory map.

This might be due to a handler of the
gEfiEventBeforeExitBootServicesGuid event group that fails to close
the event, and so it gets signaled every time. This is a fairly recent
addition, though, so I'm not sure it even exists in QCOM's tree.

In upstream EDK2, the map key is just a monotonic counter that gets
incremented on every memory map update, so one experiment worth
conducting is to repeat the second call to ExitBootServices() a couple
of times, increasing the map key each time. Or use GetMemoryMap() to
just grab the map key without the actual memory map, and printing it
to the console (although the timer is disabled on the first call so
anything that relies on that will be shut down at this point)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ