lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9c4a635a-ce9f-4ed9-9605-002947490c61@redhat.com>
Date: Mon, 10 Mar 2025 10:34:13 +0100
From: Hans de Goede <hdegoede@...hat.com>
To: Niklas Cassel <cassel@...nel.org>, Eric <eric.4.debian@...batoulnz.fr>
Cc: Salvatore Bonaccorso <carnil@...ian.org>,
 Mario Limonciello <mario.limonciello@....com>,
 Christoph Hellwig <hch@...radead.org>,
 Mika Westerberg <mika.westerberg@...ux.intel.com>,
 Damien Le Moal <dlemoal@...nel.org>, Jian-Hong Pan <jhp@...lessos.org>,
 regressions@...ts.linux.dev, linux-kernel@...r.kernel.org,
 stable@...r.kernel.org, linux-ide@...r.kernel.org,
 Dieter Mummenschanz <dmummenschanz@....de>
Subject: Re: Regression from 7627a0edef54 ("ata: ahci: Drop low power policy
 board type") on reboot (but not cold boot)

Hi,

On 7-Mar-25 10:53, Niklas Cassel wrote:
> Hello Eric,
> 
> On Thu, Mar 06, 2025 at 01:27:17PM +0100, Eric wrote:
>>
>> I installed the same system on a USB stick, on which I also installed grub,
>> so that the reboot is made independent of weather the UEFI sees the SSD disk
>> or not. I'll attach dmesg extracts (grep on ata or ahci) to this mail.
> 
> Exellent idea!
> 
> 
>>
>> One is the dmesg after coldbooting from the USB stick, the other is
>> rebooting on the USB stick. First of all, the visible result : the SSD is
>> not detected by linux at reboot (but is when coldbooting).
>>
>> Here is what changes :
>>
>> eric@...ihir:~$ diff
>> /media/eric/trixieUSB/home/eric/dmesg-ahci-ata-coldboot.untimed.txt
>> /media/eric/trixieUSB/home/eric/dmesg-ahci-ata-reboot.untimed.txt
>>
>> 4c4
>> <  ahci 0000:00:11.0: 4/4 ports implemented (port mask 0x3c)
>> ---
>>>   ahci 0000:00:11.0: 3/3 ports implemented (port mask 0x38)
>> 14c14
>> <  ata3: SATA max UDMA/133 abar m1024@...eb0b000 port 0xfeb0b200 irq 19
>> lpm-pol 3
>> ---
>>>   ata3: DUMMY
>> 27,28d26
>> <  ata3: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
>> <  ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> 29a28
>>>   ata6: SATA link up 3.0 Gbps (SStatus 123 SControl 300)
>> 31,34d29
>> <  ata3.00: Model 'Samsung SSD 870 QVO 2TB', rev 'SVQ02B6Q', applying
>> quirks: noncqtrim zeroaftertrim noncqonati
>> <  ata3.00: supports DRM functions and may not be fully accessible
>> <  ata3.00: ATA-11: Samsung SSD 870 QVO 2TB, SVQ02B6Q, max UDMA/133
>> <  ata3.00: 3907029168 sectors, multi 1: LBA48 NCQ (not used)
>> 37a33
>>>   ata5.00: configured for UDMA/100
>> 40d35
>> <  ata5.00: configured for UDMA/100
>> 43,46d37
>> <  ata3.00: Features: Trust Dev-Sleep
>> <  ata3.00: supports DRM functions and may not be fully accessible
>> <  ata3.00: configured for UDMA/133
>> <  scsi 2:0:0:0: Direct-Access     ATA      Samsung SSD 870 2B6Q PQ: 0 ANSI:
>> 5
>> 50,51d40
>> <  ata3.00: Enabling discard_zeroes_data
>> <  ata3.00: Enabling discard_zeroes_data
>>
>> I hope this is useful for diagnosing the problem.
> 
> It is indeed!
> 
> Wow.
> 
> The problem does not appear to be with the SSD firmware.
> 
> The problem appears to be that your AHCI controller reports different
> values in the PI (Ports Implemented) register.
> 
> This is supposed to be a read-only register :)
> 
> At cold boot the print is:
> 4/4 ports implemented (port mask 0x3c)
> meaning ports 1,2 are not implemented (DUMMY ports).
> 
> At reboot the print is:
> 3/3 ports implemented (port mask 0x38)
> meaning ports 1,2,3 are not implemented (DUMMY ports).
> 
> So, the problem is that your AHCI controller appears to report different
> values in the PI register.
> 
> Most likely, if the AHCI controller reported the same register values the
> second boot, libata would be able to scan and detect the drive correctly.

I think that the port-mask register is only read-only from an OS pov,
the BIOS/UEFI/firmware can likely set it to e.g. exclude ports which are
not enabled on the motherboard (e.g. an M2 slot which can do both pci-e + 
ata and is used in pci-e mode, so the sata port on that slot should be
ignored).

What we seem to be hitting here is a bug where the UEFI can not detect
the SATA SSD after reboot if it ALPM was used by the OS before reboot and
the UEFI's SATA driver responds to the not detecting by clearing the bit
in the port-mask register.

The UEFI not detecting the disk after reboot when ALPM was in use also
matches with not being able to boot from the disk after reboot.

I think what would be worth a try would be to disable ALPM on reboot
from a driver shutdown hook. IIRC the ALPM level can be changed at runtime
from a sysfs file, so we should be able to do the same at shutdown ?

Its been a while since I last touched the AHCI code, so I hope someone else
can write a proof of concept patch with the shutdown handler disabling ALPM
on reboot ?

Regards,

Hans


Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ