lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAPpJ_efbRDqA9ybmdJ9iNunVmoHukGekuFtEE3X-nwOxfYivcg@mail.gmail.com>
Date: Wed, 7 Feb 2024 14:31:29 +0800
From: Jian-Hong Pan <jhp@...lessos.org>
To: Niklas Cassel <cassel@...nel.org>
Cc: Dan Williams <dan.j.williams@...el.com>, Daniel Drake <drake@...lessos.org>, 
	Vitalii Solomonov <solomonov.v@...il.com>, Mika Westerberg <mika.westerberg@...ux.intel.com>, 
	David Box <david.e.box@...ux.intel.com>, Damien Le Moal <dlemoal@...nel.org>, 
	Nirmal Patel <nirmal.patel@...ux.intel.com>, 
	Jonathan Derrick <jonathan.derrick@...ux.dev>, linux-ide@...r.kernel.org, 
	linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org, linux@...lessos.org
Subject: Re: [PATCH 1/2] ata: ahci: Add force LPM policy quirk for ASUS B1400CEAE

Niklas Cassel <cassel@...nel.org> 於 2024年2月6日 週二 下午9:12寫道:
>
> On Tue, Feb 06, 2024 at 04:39:02PM +0800, Jian-Hong Pan wrote:
> > Niklas Cassel <cassel@...nel.org> 於 2024年2月5日 週一 下午7:33寫道:
> >
> > Have the comparison:
> >
> > * Bind LPM policy with the patch "ata: ahci: Add force LPM policy
> > quirk for ASUS B1400CEAE" based on kernel v6.8-rc2:
> >
> > $ dmesg | grep -E "(SATA|ata1|ahci)"
> > [    0.791497] ahci 10000:e0:17.0: version 3.0
> > [    0.791499] ahci 10000:e0:17.0: force controller follow LPM policy
> > [    0.791517] ahci 10000:e0:17.0: can't derive routing for PCI INT A
> > [    0.791518] ahci 10000:e0:17.0: PCI INT A: no GSI
> > [    0.791637] ahci 10000:e0:17.0: ahci_update_initial_lpm_policy: policy 3
> > [    0.791652] ahci 10000:e0:17.0: ahci_intel_pcs_quirk: not Intel,
> > the vendor is 0xffffffff
> > [    0.791662] ahci 10000:e0:17.0: AHCI 0001.0301 32 slots 1 ports 6
> > Gbps 0x1 impl SATA mode
> > [    0.791663] ahci 10000:e0:17.0: flags: 64bit ncq sntf pm clo only
> > pio slum part deso sadm sds
> > [    0.791771] scsi host0: ahci
> > [    0.791806] ata1: SATA max UDMA/133 abar m2048@...6102000 port
> > 0x76102100 irq 145 lpm-pol 3
> > [    0.791808] ahci 10000:e0:17.0: ahci_init_one: probed
> > [    1.109393] ata1: sata_link_resume: rc=0
> > [    1.109415] ata1: BUSY ? 0 (status: 0x50) SStatus.DET: 0x3
> > [    1.109418] ata1: sata_link_hardreset: is 0
> > [    1.109420] ata1: sata_link_hardreset: is on line, returns 0
> > [    1.109444] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
> > [    1.110161] ata1.00: ATA-10: WDC WD10SPZX-80Z10T2, 04.01A04, max UDMA/133
> > [    1.112047] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32), AA
> > [    1.112054] ata1.00: Features: NCQ-prio
> > [    1.114814] ata1.00: configured for UDMA/133
> > [    1.114821] ata1: ahci_set_lpm: policy=3
> > [    1.114837] ata1: sata_link_scr_lpm: policy is 3 and original
> > scontrol 0x00000300
> > [    1.114840] ata1: sata_link_scr_lpm: write scontrol 0x00000000
> >
> > The SATA link is up and SATA storage shows up.
> > Full dmesg as the attachment of
> > https://bugzilla.kernel.org/show_bug.cgi?id=217114#c28
> >
> > * Bind LPM policy with PCI IDs like commit 104ff59af73a ("ata: ahci:
> > Add Tiger Lake UP{3,4} AHCI controller"):
> >
> > $ dmesg | grep -E "(SATA|ata1|ahci)"
> > [    0.783125] ahci 10000:e0:17.0: version 3.0
> > [    0.783143] ahci 10000:e0:17.0: can't derive routing for PCI INT A
> > [    0.783145] ahci 10000:e0:17.0: PCI INT A: no GSI
> > [    0.783257] ahci 10000:e0:17.0: ahci_update_initial_lpm_policy: policy 3
> > [    0.783280] ahci 10000:e0:17.0: ahci_intel_pcs_quirk: PCS_6 is 0x0000
> > [    0.783281] ahci 10000:e0:17.0: ahci_intel_pcs_quirk: write PCS_6 with 0x0001
> > [    0.783296] ahci 10000:e0:17.0: AHCI 0001.0301 32 slots 1 ports 6
> > Gbps 0x1 impl SATA mode
> > [    0.783298] ahci 10000:e0:17.0: flags: 64bit ncq sntf pm clo only
> > pio slum part deso sadm sds
> > [    0.783402] scsi host0: ahci
> > [    0.783440] ata1: SATA max UDMA/133 abar m2048@...6102000 port
> > 0x76102100 irq 144 lpm-pol 3
> > [    0.783442] ahci 10000:e0:17.0: ahci_init_one: probed
> > [    1.096930] ata1: sata_link_resume: rc=0
> > [    1.096960] ata1: sata_link_hardreset: ata_phys_link_offline is True
> > [    1.096962] ata1: sata_link_hardreset: is off line, returns 0
> > [    1.097000] ata1: SATA link down (SStatus 4 SControl 300)
> > [    1.097025] ata1: ahci_set_lpm: policy=3
> > [    1.097051] ata1: sata_link_scr_lpm: policy is 3 and original
> > scontrol 0x00000300
> > [    1.097054] ata1: sata_link_scr_lpm: write scontrol 0x00000304
> >
> > The SATA link is down and SATA storage disappears.
> > Full dmesg as the attachment of
> > https://bugzilla.kernel.org/show_bug.cgi?id=217114#c29
> >
>
> So in summary:
> When Intel VMD is on, and the ahci_intel_pcs_quirk is applied => NOT OK
> When Intel VMD is on, and the ahci_intel_pcs_quirk is not applied => OK
>
> When Intel VMD is off, and the ahci_intel_pcs_quirk is applied => OK
> When Intel VMD is off, and the ahci_intel_pcs_quirk is not applied => ?

When Intel VMD is off, and the ahci_intel_pcs_quirk is not applied => OK

> Excellent find!
>
>
>
> In the bad case:
>
> sata_link_hardreset() sets SControl.DET to 1, to establish the interface
> communication. Then sleeps for 1 ms.
>
> Then it calls sata_link_resume(), which clears SControl.DET to 0.
> (This matches the AHCI spec which says that SControl.DET should be set
> to 1 for at least 1 ms.)
>
> sata_link_hardreset() then calls ata_phys_link_offline(),
> which is essentially defined as: return !(SStatus.DET == 0x3)
> ata_phys_link_offline() returns true, since SStatus.DET == 0x4.
>
> SStatus.DET == 0x4 means: Phy in offline mode as a result of the
> interface being disabled or running in a BIST loopback mode.
>
> If the physical link is not established, there is no point to call
> ata_wait_ready() (which waits for the device to become ready on the
> protocol level), as the physical link could not even be established.
>
> After that, we write SControl.DET to set bit 4 to disable the port,
> in order to save power. This is only done because sata_link_hardreset()
> failed to establish a link after toggling SControl.DET == 1.
>
> So the problem is that SStatus.DET never changed to 0x3 after toggling
> SControl.DET == 1.
>
>
> >
> > However, I notice more interesting thing:
> > "drivers/ata/ahci.c:ahci_intel_pcs_quirk()"!
> > If bind LPM policy with PCI IDs matching, then it does the PCS quirk.
> > But, binding with the patch "ata: ahci: Add force LPM policy quirk for
> > ASUS B1400CEAE" does not, because the vendor is ANY vendor, not Intel.
> >
> > So, I did following test:
> >
> > If I modify the PCI vendor check condition with the pdev, not the PCI
> > ID's vendor:
> >
> > diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> > index 7ecd56c8262a..ece709ac20d6 100644
> > --- a/drivers/ata/ahci.c
> > +++ b/drivers/ata/ahci.c
> > @@ -1706,12 +1709,16 @@ static void ahci_intel_pcs_quirk(struct
> > pci_dev *pdev, struct ahci_host_priv *hp
> >         /*
> >          * Only apply the 6-port PCS quirk for known legacy platforms.
> >          */
> > -       if (!id || id->vendor != PCI_VENDOR_ID_INTEL)
> > +       if (!id || pdev->vendor != PCI_VENDOR_ID_INTEL) {
> > +               dev_info(&pdev->dev, "%s: not Intel, the vendor is
> > 0x%08x\n", __func__, id->vendor);
> >                 return;
> > +       }
>
> The reason why you are seeing this is because Tiger Lake does not have
> an entry in the ahci_pci_tbl in mainline, so it uses the generic entry
> which matches on the AHCI class code:
> https://github.com/torvalds/linux/blob/v6.8-rc3/drivers/ata/ahci.c#L636
>
> If you revert 6210038aeaf4 ("ata: ahci: Revert "ata: ahci: Add Tiger Lake
> UP{3,4} AHCI controller""), you will get an explicit entry in the
> ahci_pci_tbl.
>
> But to clarify, I think that it would make sense to add:
>
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index d2460fa985b7..e462509a45e8 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -1672,12 +1672,18 @@ static void ahci_update_initial_lpm_policy(struct ata_port *ap,
>
>  static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hpriv)
>  {
> -       const struct pci_device_id *id = pci_match_id(ahci_pci_tbl, pdev);
> +       const struct pci_device_id *id;
>         u16 tmp16;
>
> +       /* If the detected PCI device is not an Intel device, skip. */
> +       if (pdev->vendor != PCI_VENDOR_ID_INTEL)
> +               return;
> +
>         /*
> -        * Only apply the 6-port PCS quirk for known legacy platforms.
> +        * See if there is an explicit entry for this PCI device in
> +        * ahci_pci_tbl, if there is not, do not apply the quirk.
>          */
> +       id = pci_match_id(ahci_pci_tbl, pdev);
>         if (!id || id->vendor != PCI_VENDOR_ID_INTEL)
>                 return;
>
>
>
> >
> > Then, the SATA HDD always disappears like binding the LPM policy with
> > PCI IDs matching, even with the patch "ata: ahci: Add force LPM policy
> > quirk for ASUS B1400CEAE".
> > So, I think ahci_intel_pcs_quirk() is the key point.
>
> I agree.
>
>
> Can you verify that things work as expected when doing a:
> $ git revert 6210038aeaf49c395c2da57572246d93ec67f6d4
> to re-add the explicit entry, if you also do a:
>
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -1672,6 +1672,7 @@ static void ahci_update_initial_lpm_policy(struct ata_port *ap,
>
>  static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hpriv)
>  {
> +#if 0
>         const struct pci_device_id *id = pci_match_id(ahci_pci_tbl, pdev);
>         u16 tmp16;
>
> @@ -1698,6 +1699,7 @@ static void ahci_intel_pcs_quirk(struct pci_dev *pdev, struct ahci_host_priv *hp
>                 tmp16 |= hpriv->port_map;
>                 pci_write_config_word(pdev, PCS_6, tmp16);
>         }
> +#endif
>  }
>
> To make the quirk a no-op?

Here is the test result:

Both enabled & disabled VMD with no-op ahci_intel_pcs_quirk() shows
the SATA storage on ASUS B1400CEAE. => OK

$ cat /tmp/dmesg.log
[    0.799439] ahci 10000:e0:17.0: version 3.0
[    0.799459] ahci 10000:e0:17.0: can't derive routing for PCI INT A
[    0.799460] ahci 10000:e0:17.0: PCI INT A: no GSI
[    0.799582] ahci 10000:e0:17.0: ahci_update_initial_lpm_policy: policy 3
[    0.799615] ahci 10000:e0:17.0: AHCI 0001.0301 32 slots 1 ports 6
Gbps 0x1 impl SATA mode
[    0.799617] ahci 10000:e0:17.0: flags: 64bit ncq sntf pm clo only
pio slum part deso sadm sds
[    0.799722] scsi host0: ahci
[    0.799760] ata1: SATA max UDMA/133 abar m2048@...6102000 port
0x76102100 irq 144 lpm-pol 3
[    0.799761] ahci 10000:e0:17.0: ahci_init_one: probed
[    1.112519] ata1: sata_link_resume: rc=0
[    1.112541] ata1: BUSY ? 0 (status: 0x50) SStatus.DET: 0x3
[    1.112545] ata1: sata_link_hardreset: is 0
[    1.112547] ata1: sata_link_hardreset: is on line, returns 0
[    1.112571] ata1: SATA link up 6.0 Gbps (SStatus 133 SControl 300)
[    1.113280] ata1.00: ATA-10: WDC WD10SPZX-80Z10T2, 04.01A04, max UDMA/133
[    1.114880] ata1.00: 1953525168 sectors, multi 16: LBA48 NCQ (depth 32), AA
[    1.114887] ata1.00: Features: NCQ-prio
[    1.117148] ata1.00: configured for UDMA/133
[    1.117154] ata1: ahci_set_lpm: policy=3
[    1.117169] ata1: sata_link_scr_lpm: policy is 3 and original
scontrol 0x00000300
[    1.117172] ata1: sata_link_scr_lpm: write scontrol 0x00000000

Jian-Hong Pan

> To be honest, this quirk looks horrible.
>
> Looking at the original commit:
> c312ef176399 ("libata/ahci: Drop PCS quirk for Denverton and beyond")
>
> It claims that:
>
> Rather than try to fix the PCS quirk to consider the DNV register layout
> instead require explicit opt-in. The assumption is that the OS driver
> need not touch this register, and platforms can be added with a new
> boad_ahci_pcs7 board-id when / if problematic platforms are found in the
> future.
>
> However, it does NOT require an explicit opt-in!
>
> If we were to add an entry with board type "board_ahci" or
> "board_ahci_low_power" for Tiger Lake, the quirk gets applied...
>
> See also:
> 09d6ac8dc51a ("libata/ahci: Fix PCS quirk application")
>
> So basically, what ahci_intel_pcs_quirk() does is that it checks
> if there is an explicit entry in ahci_pci_tbl.
> If there is not, the quirk is not applied.
>
> If there is an entry, and the enum for that board has a value that
> is less than board_ahci_pcs7, the quirk is applied...
>
> But that will be *ALL* other board types since board_ahci_pcs7 is
> defined last in the enum:
> https://github.com/torvalds/linux/blob/v6.8-rc3/drivers/ata/ahci.c#L75
>
> Not only that but the comment for that enum is wrong:
> https://github.com/torvalds/linux/blob/v6.8-rc3/drivers/ata/ahci.c#L71-L74
>
>         /*
>          * board IDs for Intel chipsets that support more than 6 ports
>          * *and* end up needing the PCS quirk.
>          */
>
> Is is the opposite... board IDs that do NOT need the PCS quirk...
>
> But this is not the way we add quirks.
> We add a flag and a new board_id and mark the PCI device and vendor ids
> that are affected to use that board, see e.g.
> 20730e9b2778 ("ahci: add 43-bit DMA address quirk for ASMedia ASM1061 controllers")
>
> We don't add a quirk and apply it for everything (board_ahci,
> board_ahci_low_power) except for a specific entry (board_ahci_pcs7).
>
> It seems that at least Intel AHCI controllers that also have Intel VMD
> enabled break when this quirk is applied.
>
> I guess one way would be to do a:
> git show c312ef176399:drivers/ata/ahci.c | grep "PCI_VDEVICE(INTEL"
> and replace everything that is not: board_ahci_pcs7
> with a board_ahci_pcs_quirk, board_ahci_low_power_pcs_quirk, and
> board_ahci_avn_pcs_quirk, and after that change all board_ahci_pcs7
> entries to board_ahci, and assume that entries added since c312ef176399
> do not need the quirk.
>
> But it would be nice if someone from Intel could clean this up.
>
>
> Kind regards,
> Niklas

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ