lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <87sfvuvqhg.fsf@yujinakao.com>
Date:   Wed, 17 Nov 2021 22:52:43 +0900
From:   Yuji Nakao <contact@...inakao.com>
To:     Marc Zyngier <maz@...nel.org>,
        Krzysztof Wilczyński <kw@...ux.com>
Cc:     Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
        linux-kernel@...r.kernel.org,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        ". Bjorn Helgaas" <bhelgaas@...gle.com>,
        Arnd Bergmann <arnd@...db.de>, Sasha Levin <sashal@...nel.org>
Subject: Re: Kernel 5.15 doesn't detect SATA drive on boot

Marc Zyngier <maz@...nel.org> writes:

> Hi Krzysztof, Yugi,
>
> On Tue, 16 Nov 2021 23:26:18 +0000,
> Krzysztof Wilczyński <kw@...ux.com> wrote:
>> 
>> [+CC Arnd, Bjorn, Marc and Sasha for visibility]
>> 
>> Hello Damien and Yuji,
>> 
>> [...]
>> > > I'm using Arch Linux on MacBook Air 2010. I updated `linux` package[1]
>> > > from v5.14.16 to v5.15.2 the other day, and the boot process stalled
>> > > with the following message.
>> > > 
>> > > ```shell
>> > > :: running early hook [udev]
>> > > Starting version 249.6-3-arch
>> > > :: running hook [udev]
>> > > :: Triggering uevents...
>> > > Waiting 10 seconds for device /dev/sda3 ...
>> > > ERROR: device '/dev/sda3' not found. Skipping fsck.
>> > > :: mounting '/dev/sda' on real root
>> > > mount: /new_root: no filesystem type specified.
>> > > You are now being dropped into an emergency shell.
>> > > sh: can't access tty; job control turned off
>> > > [rootfs ]#
>> > > ```
>> > > 
>> > > In the emergency shell there's no `sda` devices when I type `$ ls
>> > > /dev/`. By downgrading the kernel, boot process works properly.
>> > > 
>> > > See also Arch Linux bug tracker[2]. There are similar reports on
>> > > Apple devices.
>> > > 
>> > > `dmesg` output in the emergency shell is attached. I guess this issue is
>> > > related to libata, so CCed to Damien Le Moal.
>> > 
>> > I think that this problem is due to recent PCI subsystem changes which broke Mac
>> > support. The problem show up as the interrupts not being delivered, which in
>> > turn result in the kernel assuming that the drive is not working (see the
>> > timeout error messages in your dmesg output). Hence your boot drive detection
>> > fails and no rootfs to mount.
>> > 
>> > Adding linux-pci list.
>> > 
>> > 
>> > 
>> > > 
>> > > Regards.
>> > > 
>> > > [1] https://archlinux.org/packages/core/x86_64/linux/
>> > > [2] https://bugs.archlinux.org/task/72734
>> 
>> The error in the dmesg output (see [2] where the log file is attached)
>> looks similar to the problem reported a week or so ago, as per:
>> 
>>   https://lore.kernel.org/linux-pci/ee3884db-da17-39e3-4010-bcc8f878e2f6@xenosoft.de/
>> 
>> The problematic commits where reverted by Bjorn and the Pull Request that
>> did it was accepted, as per:
>> 
>>   https://lore.kernel.org/linux-pci/20211111195040.GA1345641@bhelgaas/
>> 
>> Thus, this would made its way into 5.16-rc1, I suppose.  We might have to
>> back-port this to the stable and long-term kernels.
>> 
>> Yuji, could you, if you have some time to spare, try the 5.16-rc1 to see if
>> this have gotten better on your system?
>
> I'm afraid you have the wrong end of the stick on this one.
>
> The issue is reported on 5.15, and the issue you are pointing at was
> introduced during the 5.16 merge window. The problematic commit wasn't
> reverted, but instead fixed in 10a20b34d735 ("of/irq: Don't ignore
> interrupt-controller when interrupt-map failed").
>
> The issue is instead very close to the one reported at [1], for which
> we have a very conservative workaround in 5.16-rc1 (commits
> 2226667a145d and f21082fb20db). Looking at the dmesg log provided by
> Yugi, you find the following nugget:
>
> [    0.378564] pci 0000:00:0a.0: [10de:0d88] type 00 class 0x010601
>
> Oh look, a NVIDIA AHCI controller, probably similar enough to the one
> discussed in the issue reported by Rui.
>
> Yugi, could you please test the patch below on top of 5.16-rc1?
>
> Thanks,
>
> 	M.
>
> [1] https://lore.kernel.org/r/CALjTZvbzYfBuLB+H=fj2J+9=DxjQ2Uqcy0if_PvmJ-nU-qEgkg@mail.gmail.com
>
>
> diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> index 003950c738d2..cd88eddf614d 100644
> --- a/drivers/pci/quirks.c
> +++ b/drivers/pci/quirks.c
> @@ -5857,3 +5857,4 @@ static void nvidia_ion_ahci_fixup(struct pci_dev *pdev)
>  	pdev->dev_flags |= PCI_DEV_FLAGS_HAS_MSI_MASKING;
>  }
>  DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup);
> +DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0d88, nvidia_ion_ahci_fixup);
>
> -- 
> Without deviation from the norm, progress is not possible.

I installed plane 5.16-rc1 using pre-built image[1] by linux-mainline
aur package[2] maintainer and 5.16-rc1 with the patch provided from
Mark. Both versions succeeded to boot. Thank you for quick
investigation. I'll wait for backporting the fix.

[1] https://wiki.archlinux.org/title/Unofficial_user_repositories#miffe
[2] https://aur.archlinux.org/packages/linux-mainline/

For the record, I attach each dmesg log.
5.16-rc1 dmesg

View attachment "dmesg_v5.16-rc1.log" of type "text/plain" (71858 bytes)


5.16-rc1-patched dmesg

View attachment "dmesg_v5.16-rc1-patched.log" of type "text/plain" (70401 bytes)


Regards.
Yuji Nakao


Download attachment "signature.asc" of type "application/pgp-signature" (866 bytes)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ