lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ZahaKaV1jlHQ0sUx@x1-carbon>
Date: Wed, 17 Jan 2024 23:52:25 +0100
From: Niklas Cassel <cassel@...nel.org>
To: Lennert Buytenhek <kernel@...tstofly.org>
Cc: Damien Le Moal <dlemoal@...nel.org>, linux-ide@...r.kernel.org,
	linux-kernel@...r.kernel.org
Subject: Re: ASMedia ASM1062 (AHCI) hang after "ahci 0000:28:00.0: Using
 64-bit DMA addresses"

Hello Lennert,

On Wed, Jan 17, 2024 at 11:14:30PM +0200, Lennert Buytenhek wrote:
> On Tue, Jan 16, 2024 at 03:20:23PM +0100, Niklas Cassel wrote:
> 
> > > On kernel 6.6.x, with an ASMedia ASM1062 (AHCI) controller, on an
> 
> Minor correction to this: lspci says that this is an ASM1062, but it's
> actually an ASM1061.  I think that the two parts share a PCI device ID,
> and I've submitted a PCI ID DB change here:
> 
> https://admin.pci-ids.ucw.cz/read/PC/1b21/0612

FWIW, the kernel states that 0x0612 is ASM1062, and 0x0611 is ASM1061:
https://github.com/torvalds/linux/blob/v6.7/drivers/ata/ahci.c#L603-L604

But that could of course be incorrect.


When you are dumping the LnkCap in the PCI ID DB change request,
are you dumping the LnkCap for the AHCI controller or the PCI bridge?

(Because you use # lspci -s 27:00.0 in the PCI ID DB change request,
but # lspci -s 28:00.0 further down in this email.)

(Perhaps the PCI bride only has one PCI lane, but the AHCI controller has two?)


> > The DMA mask is set here:
> > https://github.com/torvalds/linux/blob/v6.7/drivers/ata/ahci.c#L967
> > 
> > And should be called using:
> > hpriv->cap & HOST_CAP_64
> > https://github.com/torvalds/linux/blob/v6.7/drivers/ata/ahci.c#L1929
> > 
> > Where hpriv->cap is capabilities reported by the AHCI controller itself.
> > So it definitely seems like your controller supports 64-bit addressing.
> 
> Perhaps, or maybe it's misreporting its capabilities, as it is an old
> part (from 2011 or before), and given that it doesn't seem to support
> 64-bit MSI addressing, either, which for a part with a 64-bit DMA engine
> would be an odd restriction:
> 
> # lspci -s 28:00.0 -vv | grep -A1 MSI:
>         Capabilities: [50] MSI: Enable+ Count=1/1 Maskable- 64bit-
>                 Address: fee00000  Data: 0000
> #

That just claims that MSIs have to use a 32-bit PCI address.

See e.g.:
00:02.0 VGA compatible controller: Intel Corporation Haswell-ULT Integrated Graphics Controller (rev 0b) (prog-if 00 [VGA controller])
  Subsystem: Lenovo Device 3978
  Flags: bus master, fast devsel, latency 0, IRQ 58
  Memory at b0000000 (64-bit, non-prefetchable) [size=4M]
  Memory at a0000000 (64-bit, prefetchable) [size=256M]
  I/O ports at 4000 [size=64]
  Expansion ROM at <unassigned> [disabled]
  Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit-
  Capabilities: [d0] Power Management version 2
  Capabilities: [a4] PCI Advanced Features
  Kernel driver in use: i915

It has 64-bit BARs, but does not support 64-bit MSIs.


> 
> (I checked the available datasheets, but there is no mention of whether
> or not the part supports 64-bit DMA.)

If you are curious, hpriv->cap is the HBA capabilities reported by the
device, see:
https://www.intel.com/content/dam/www/public/us/en/documents/technical-specifications/serial-ata-ahci-spec-rev1-3-1.pdf

3.1.1 Offset 00h: CAP – HBA Capabilities

Bit 31 - Supports 64-bit Addressing (S64A).

It seems a bit silly that the AHCI controller vendor accidentally set this
bit to 1.


> Per:
> 
> 	https://www.asus.com/motherboards-components/motherboards/workstation/pro-ws-wrx80e-sage-se-wifi/helpdesk_bios?model2Name=Pro-WS-WRX80E-SAGE-SE-WIFI
> 
> However, some Googling suggests that the ASM106x loads its own firmware
> from a directly attached SPI flash chip, and there are several versions
> of this firmware available in the wild, with different versions of the
> firmware apparently available for legacy IDE mode and for AHCI mode.  If
> (some of) the AHCI logic is indeed contained inside the firmware, I
> could see a firmware bug leading to the controller incorrectly presenting
> itself as being 64-bit DMA capable.
> 
> Some poking around in the BIOS image suggests that there is no copy of
> the ASM106x firmware inside the BIOS image.  In other words, it could be
> that, even though the machine is running the latest available BIOS, the
> ASM1061 might be running an older firmware version.
> 
> The ASM1061 firmware does not seem to be readable from software via a
> ROM BAR, and it doesn't seem to readable from software in general (the
> vendor-supplied DOS .exe updater tool only allows you to erase or
> update the SPI flash), so I can't check which firmware version it is
> currently using.
> 
> 
> > If that does not work, perhaps you could try this (completely untested) patch:
> > (You might need to modify the strings to match the exact strings reported by
> > your BIOS.)
> 
> Thanks for the patch!

Assuming that you just have ASM106x controllers in your system,
you could also replace it with a simple:

@@ -1799,6 +1823,10 @@ static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id *ent)
        if (ahci_broken_devslp(pdev))
                hpriv->flags |= AHCI_HFLAG_NO_DEVSLP;
 
+       /* must set flag prior to save config in order to take effect */
+       hpriv->flags |= AHCI_HFLAG_32BIT_ONLY;
+
 #ifdef CONFIG_ARM64
        if (pdev->vendor == PCI_VENDOR_ID_HUAWEI &&
            pdev->device == 0xa235 &&


Just for testing.


> 
> I will do some tests with PCI passthrough to a VM, to see whether, and if
> it does, exactly how the controller mangles DMA addresses.

Were you running in a VM when testing this?
(Usually you need to pass through all PCI devices in the same iommu group.)

The errors from your previous email:
[IO_PAGE_FAULT domain=0x0035 address=0x7fffff00000 flags=0x0000]
[Thu Jan  4 23:12:54 2024] ahci 0000:28:00.0: AMD-Vi: Event logged

could also suggest an iommu issue. Have you tried booting with iommu=off
and/or amd_iommu=off on the kernel command line?
(Or temporarily disable the iommu in BIOS.)


> 
> I've also ordered a discrete PCIe card with an ASM1061 chip on it, and I
> will perform similar tests with that card, to see exactly where the issue
> is, i.e. whether it is specific to this mainboard or not.
> 
> I will follow up once I will have more information.

Looking forward to hear your findings :)


Kind regards,
Niklas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ