lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <Z8rCF39n5GjTwfjP@ryzen>
Date: Fri, 7 Mar 2025 10:53:27 +0100
From: Niklas Cassel <cassel@...nel.org>
To: Eric <eric.4.debian@...batoulnz.fr>
Cc: Salvatore Bonaccorso <carnil@...ian.org>,
	Mario Limonciello <mario.limonciello@....com>,
	Christoph Hellwig <hch@...radead.org>,
	Mika Westerberg <mika.westerberg@...ux.intel.com>,
	Damien Le Moal <dlemoal@...nel.org>,
	Jian-Hong Pan <jhp@...lessos.org>, regressions@...ts.linux.dev,
	linux-kernel@...r.kernel.org, stable@...r.kernel.org,
	linux-ide@...r.kernel.org,
	Dieter Mummenschanz <dmummenschanz@....de>
Subject: Re: Regression from 7627a0edef54 ("ata: ahci: Drop low power policy
 board type") on reboot (but not cold boot)

Hello Eric,

On Thu, Mar 06, 2025 at 01:27:17PM +0100, Eric wrote:
> 
> I installed the same system on a USB stick, on which I also installed grub,
> so that the reboot is made independent of weather the UEFI sees the SSD disk
> or not. I'll attach dmesg extracts (grep on ata or ahci) to this mail.

Exellent idea!


> 
> One is the dmesg after coldbooting from the USB stick, the other is
> rebooting on the USB stick. First of all, the visible result : the SSD is
> not detected by linux at reboot (but is when coldbooting).
> 
> Here is what changes :
> 
> eric@...ihir:~$ diff
> /media/eric/trixieUSB/home/eric/dmesg-ahci-ata-coldboot.untimed.txt
> /media/eric/trixieUSB/home/eric/dmesg-ahci-ata-reboot.untimed.txt
> 
> 4c4
> <  ahci 0000:00:11.0: 4/4 ports implemented (port mask 0x3c)
> ---
> >  ahci 0000:00:11.0: 3/3 ports implemented (port mask 0x38)
> 14c14
> <  ata3: SATA max UDMA/133 abar m1024@...eb0b000 port 0xfeb0b200 irq 19
> lpm-pol 3
> ---
> >  ata3: DUMMY
> 27,28d26
> <  ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> <  ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> 29a28
> >  ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
> 31,34d29
> <  ata3.00: Model 'Samsung SSD 870 QVO 2TB', rev 'SVQ02B6Q', applying
> quirks: noncqtrim zeroaftertrim noncqonati
> <  ata3.00: supports DRM functions and may not be fully accessible
> <  ata3.00: ATA-11: Samsung SSD 870 QVO 2TB, SVQ02B6Q, max UDMA/133
> <  ata3.00: 3907029168 sectors, multi 1: LBA48 NCQ (not used)
> 37a33
> >  ata5.00: configured for UDMA/100
> 40d35
> <  ata5.00: configured for UDMA/100
> 43,46d37
> <  ata3.00: Features: Trust Dev-Sleep
> <  ata3.00: supports DRM functions and may not be fully accessible
> <  ata3.00: configured for UDMA/133
> <  scsi 2:0:0:0: Direct-Access     ATA      Samsung SSD 870 2B6Q PQ: 0 ANSI:
> 5
> 50,51d40
> <  ata3.00: Enabling discard_zeroes_data
> <  ata3.00: Enabling discard_zeroes_data
> 
> I hope this is useful for diagnosing the problem.

It is indeed!

Wow.

The problem does not appear to be with the SSD firmware.

The problem appears to be that your AHCI controller reports different
values in the PI (Ports Implemented) register.

This is supposed to be a read-only register :)

At cold boot the print is:
4/4 ports implemented (port mask 0x3c)
meaning ports 1,2 are not implemented (DUMMY ports).

At reboot the print is:
3/3 ports implemented (port mask 0x38)
meaning ports 1,2,3 are not implemented (DUMMY ports).

So, the problem is that your AHCI controller appears to report different
values in the PI register.

Most likely, if the AHCI controller reported the same register values the
second boot, libata would be able to scan and detect the drive correctly.

What AHCI controller is this?

$ sudo lspci -nns 0000:00:11.0


Which kernel version are you using?

Please test with v6.14-rc5 as there was a bug in v6.14-rc4 where
mask_port_map would get incorrecly set. (Although, this bug should only
affect device tree based platforms. Most often when using UEFI, you do
not use device tree.)


I do see that your AHCI controller is < AHCI 1.3, so we do take this path:
https://github.com/torvalds/linux/blob/v6.14-rc5/drivers/ata/libahci.c#L571-L578

Could you please provide a full dmesg?


Also, it would be helpful if you could print every time we read/write the
PI register. (Don't ask me why libata writes a read-only register...
we were not always the maintainers for this driver...)


diff --git a/drivers/ata/libahci.c b/drivers/ata/libahci.c
index e7ace4b10f15..dd837834245b 100644
--- a/drivers/ata/libahci.c
+++ b/drivers/ata/libahci.c
@@ -533,6 +533,7 @@ void ahci_save_initial_config(struct device *dev, struct ahci_host_priv *hpriv)
 
 	/* Override the HBA ports mapping if the platform needs it */
 	port_map = readl(mmio + HOST_PORTS_IMPL);
+	dev_err(dev, "%s:%d PI: read: %#lx\n", __func__, __LINE__, port_map);
 	if (hpriv->saved_port_map && port_map != hpriv->saved_port_map) {
 		dev_info(dev, "forcing port_map 0x%lx -> 0x%x\n",
 			port_map, hpriv->saved_port_map);
@@ -629,6 +630,7 @@ static void ahci_restore_initial_config(struct ata_host *host)
 	if (hpriv->saved_cap2)
 		writel(hpriv->saved_cap2, mmio + HOST_CAP2);
 	writel(hpriv->saved_port_map, mmio + HOST_PORTS_IMPL);
+	dev_err(host->dev, "%s:%d PI: wrote: %#x\n", __func__, __LINE__, hpriv->saved_port_map);
 	(void) readl(mmio + HOST_PORTS_IMPL);	/* flush */
 
 	for_each_set_bit(i, &port_map, AHCI_MAX_PORTS) {




Kind regards,
Niklas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ