lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 17 Nov 2021 09:07:53 +0000
From:   Marc Zyngier <maz@...nel.org>
To:     Krzysztof Wilczyński <kw@...ux.com>,
        Yuji Nakao <contact@...inakao.com>
Cc:     Damien Le Moal <damien.lemoal@...nsource.wdc.com>,
        linux-kernel@...r.kernel.org,
        "linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
        ". Bjorn Helgaas" <bhelgaas@...gle.com>,
        Arnd Bergmann <arnd@...db.de>, Sasha Levin <sashal@...nel.org>
Subject: Re: Kernel 5.15 doesn't detect SATA drive on boot

Hi Krzysztof, Yugi,

On Tue, 16 Nov 2021 23:26:18 +0000,
Krzysztof Wilczyński <kw@...ux.com> wrote:
> 
> [+CC Arnd, Bjorn, Marc and Sasha for visibility]
> 
> Hello Damien and Yuji,
> 
> [...]
> > > I'm using Arch Linux on MacBook Air 2010. I updated `linux` package[1]
> > > from v5.14.16 to v5.15.2 the other day, and the boot process stalled
> > > with the following message.
> > > 
> > > ```shell
> > > :: running early hook [udev]
> > > Starting version 249.6-3-arch
> > > :: running hook [udev]
> > > :: Triggering uevents...
> > > Waiting 10 seconds for device /dev/sda3 ...
> > > ERROR: device '/dev/sda3' not found. Skipping fsck.
> > > :: mounting '/dev/sda' on real root
> > > mount: /new_root: no filesystem type specified.
> > > You are now being dropped into an emergency shell.
> > > sh: can't access tty; job control turned off
> > > [rootfs ]#
> > > ```
> > > 
> > > In the emergency shell there's no `sda` devices when I type `$ ls
> > > /dev/`. By downgrading the kernel, boot process works properly.
> > > 
> > > See also Arch Linux bug tracker[2]. There are similar reports on
> > > Apple devices.
> > > 
> > > `dmesg` output in the emergency shell is attached. I guess this issue is
> > > related to libata, so CCed to Damien Le Moal.
> > 
> > I think that this problem is due to recent PCI subsystem changes which broke Mac
> > support. The problem show up as the interrupts not being delivered, which in
> > turn result in the kernel assuming that the drive is not working (see the
> > timeout error messages in your dmesg output). Hence your boot drive detection
> > fails and no rootfs to mount.
> > 
> > Adding linux-pci list.
> > 
> > 
> > 
> > > 
> > > Regards.
> > > 
> > > [1] https://archlinux.org/packages/core/x86_64/linux/
> > > [2] https://bugs.archlinux.org/task/72734
> 
> The error in the dmesg output (see [2] where the log file is attached)
> looks similar to the problem reported a week or so ago, as per:
> 
>   https://lore.kernel.org/linux-pci/ee3884db-da17-39e3-4010-bcc8f878e2f6@xenosoft.de/
> 
> The problematic commits where reverted by Bjorn and the Pull Request that
> did it was accepted, as per:
> 
>   https://lore.kernel.org/linux-pci/20211111195040.GA1345641@bhelgaas/
> 
> Thus, this would made its way into 5.16-rc1, I suppose.  We might have to
> back-port this to the stable and long-term kernels.
> 
> Yuji, could you, if you have some time to spare, try the 5.16-rc1 to see if
> this have gotten better on your system?

I'm afraid you have the wrong end of the stick on this one.

The issue is reported on 5.15, and the issue you are pointing at was
introduced during the 5.16 merge window. The problematic commit wasn't
reverted, but instead fixed in 10a20b34d735 ("of/irq: Don't ignore
interrupt-controller when interrupt-map failed").

The issue is instead very close to the one reported at [1], for which
we have a very conservative workaround in 5.16-rc1 (commits
2226667a145d and f21082fb20db). Looking at the dmesg log provided by
Yugi, you find the following nugget:

[    0.378564] pci 0000:00:0a.0: [10de:0d88] type 00 class 0x010601

Oh look, a NVIDIA AHCI controller, probably similar enough to the one
discussed in the issue reported by Rui.

Yugi, could you please test the patch below on top of 5.16-rc1?

Thanks,

	M.

[1] https://lore.kernel.org/r/CALjTZvbzYfBuLB+H=fj2J+9=DxjQ2Uqcy0if_PvmJ-nU-qEgkg@mail.gmail.com


diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 003950c738d2..cd88eddf614d 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -5857,3 +5857,4 @@ static void nvidia_ion_ahci_fixup(struct pci_dev *pdev)
 	pdev->dev_flags |= PCI_DEV_FLAGS_HAS_MSI_MASKING;
 }
 DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0ab8, nvidia_ion_ahci_fixup);
+DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_NVIDIA, 0x0d88, nvidia_ion_ahci_fixup);

-- 
Without deviation from the norm, progress is not possible.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ