lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <9ac6e1ab-f2af-4bff-9d50-24df68ca1bb9@redhat.com>
Date: Thu, 13 Mar 2025 11:04:13 +0100
From: Hans de Goede <hdegoede@...hat.com>
To: Niklas Cassel <cassel@...nel.org>
Cc: Eric <eric.4.debian@...batoulnz.fr>,
 Salvatore Bonaccorso <carnil@...ian.org>,
 Mario Limonciello <mario.limonciello@....com>,
 Christoph Hellwig <hch@...radead.org>,
 Mika Westerberg <mika.westerberg@...ux.intel.com>,
 Damien Le Moal <dlemoal@...nel.org>, Jian-Hong Pan <jhp@...lessos.org>,
 regressions@...ts.linux.dev, linux-kernel@...r.kernel.org,
 stable@...r.kernel.org, linux-ide@...r.kernel.org,
 Dieter Mummenschanz <dmummenschanz@....de>
Subject: Re: Regression from 7627a0edef54 ("ata: ahci: Drop low power policy
 board type") on reboot (but not cold boot)

Hi Niklas, Eric,

On 11-Mar-25 3:14 PM, Niklas Cassel wrote:
> Hello Hans, Eric,
> 
> On Mon, Mar 10, 2025 at 09:12:13PM +0100, Hans de Goede wrote:
>>
>> I agree with you that this is a BIOS bug of the motherboard in question
>> and/or a bad interaction between the ATI SATA controller and Samsung SSD
>> 870* models. Note that given the age of the motherboard there are likely
>> not going to be any BIOS updates fixing this though.
> 
> Looking at the number of quirks for some of the ATI SB7x0/SB8x0/SB9x0 SATA
> controllers, they really look like something special (not in a good way):
> https://github.com/torvalds/linux/blob/v6.14-rc6/drivers/ata/ahci.c#L236-L244
> 
> -Ignore SError internal
> -No MSI
> -Max 255 sectors
> -Broken 64-bit DMA
> -Retry SRST (software reset)
> 
> And that is even without the weird "disable NCQ but only for Samsung SSD
> 8xx drives" quirk when using these ATI controllers.
> 
> 
> What does bother me is that we don't know if it is this specific mobo/BIOS:
>      Manufacturer: ASUSTeK COMPUTER INC.
>      Product Name: M5A99X EVO R2.0
>      Version: Rev 1.xx
> 
>      M5A99X EVO R2.0 BIOS 2501
>      Version 2501
>      3.06 MB
>      2014/05/14
> 
> 
> that should have a NOLPM quirk, like we do for specific BIOSes:
> https://github.com/torvalds/linux/blob/v6.14-rc6/drivers/ata/ahci.c#L1402-L1439

That seems to be a Lenovo only thing though and with Intel chipsets.

> Or if it this ATI SATA controller that is always broken when it comes
> to LPM, regardless of the drive, or if it is only Samsung drives.

I'm pretty sure we can assume this will happen on all ATI SATA
controllers, the new LPM default is pretty recent and these boards are
getting old, so likely have not that many users who use distros which
ship cutting edge kernels.

I do agree with you that it is a question if this is another bad
interaction with Samsung SATA SSDs, or if it is a general ATI SATA
controller problem, but see below.

> Considering the dmesg comparing cold boot, the Maxtor drive and the
> ASUS ATAPI device seems to be recognized correctly.
> 
> Eric, could you please run:
> $ sudo hdparm -I /dev/sdX | grep "interface power management"
> 
> on both your Samsung and Maxtor drive?
> (A star to the left of feature means that the feature is enabled)
> 
> 
> 
> One guess... perhaps it could be Device Initiated PM that is broken with
> these controllers? (Even though the controller does claim to support it.)
> 
> Eric, could you please try this patch:
> 
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index f813dbdc2346..ca690fde8842 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -244,7 +244,7 @@ static const struct ata_port_info ahci_port_info[] = {
>  	},
>  	[board_ahci_sb700] = {	/* for SB700 and SB800 */
>  		AHCI_HFLAGS	(AHCI_HFLAG_IGN_SERR_INTERNAL),
> -		.flags		= AHCI_FLAG_COMMON,
> +		.flags		= AHCI_FLAG_COMMON | ATA_FLAG_NO_DIPM,
>  		.pio_mask	= ATA_PIO4,
>  		.udma_mask	= ATA_UDMA6,
>  		.port_ops	= &ahci_pmp_retry_srst_ops,
> 
> 
> 
> Normally, I do think that we need more reports, to see if it is just
> this specific BIOS, or all the ATI SB7x0/SB8x0/SB9x0 SATA controllers
> that are broken...
> 
> ...but, considering how many quirks these ATI controllers have already...

Right in the mean time Eric has reported back that the above patch fixes
this. Thank you for testing this Eric,

One reason why ATA_QUIRK_NO_NCQ_ON_ATI was introduced is because
disabling NCQ has severe performance impacts for SSDs, so we did not want
to do this for all ATI controllers; or for all Samsung drives. Given that
until the recent LPM default change we did not use DIPM on ATI chipsets
the above fix IMHO is a good fix, which even keeps the rest of the LPM
power-savings.

> ...and the fact that the one (Dieter) who reported that his Samsung SSD 870
> QVO could enter deeper sleep states just fine was running an Intel AHCI
> controller (with the same FW version as Eric), I would be open to a patch
> that sets ATA_FLAG_NO_LPM for all these ATI controllers.

Right I think it is save to assume that this is not a Samsung drive problem
it is an ATI controller problem. The only question is if this only impacts
ATI <-> Samsung SSD combinations or if it is a general issue with ATI
controllers. But given the combination of DIPM not having been enabled
on these controllers by default anyways, combined with the age of these
motherboards (*) I believe that the above patch is a good compromise to
fix the regression without needing to wait for more data.

Regards,

Hans

*) And there thus being less users making getting more data hard. And
alo meaning not having DIPM will impact only the relatively few remaining
users




Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ