lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1b4b2c6c-8119-95fd-8958-dbbecc66510c@amd.com>
Date:   Tue, 20 Jun 2023 13:36:59 -0500
From:   "Limonciello, Mario" <mario.limonciello@....com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Kai-Heng Feng <kai.heng.feng@...onical.com>, bhelgaas@...gle.com,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Vidya Sagar <vidyas@...dia.com>,
        Michael Bottini <michael.a.bottini@...ux.intel.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/ASPM: Enable ASPM on external PCIe devices

<snip>
>> A variety of Intel chipsets don't support lane width switching
>> or speed switching.  When ASPM has been enabled on a dGPU,
>> these features are utilized and breakage ensues.
> Maybe this helps explain all the completely unmaintainable ASPM
> garbage in amdgpu and radeon.
>
> If these devices are broken, we need quirks for them.

The problem is which device do you consider "broken"?
The dGPU that uses these features when the platform advertises ASPM
or the chipset which doesn't support the features that the device
uses when ASPM is active?

With this problem I'm talking about the dGPU works fine on hosts
that support these features.

KH has a lot more experience with ASPM issues and hopefully has some
other examples to share.

> We can't avoid
> ASPM in general just because random devices break.

I'm not advocating to avoid it in general, I'm saying we shouldn't
be turning it on across the board for all devices if the platform had
it off initially via a kernel command line option or Kconfig.

>> There are various methods to try to mitigate the impact both in
>> firmware and driver code.
>>
>>> This feels like a real problem to me.  There are existing mechanisms
>>> (ACPI_FADT_NO_ASPM and _OSC PCIe cap ownership) the platform can use
>>> to prevent the OS from using ASPM.
>>>
>>> If vendors assume that *in addition*, the OS will pay attention to
>>> whatever ASPM configuration BIOS did, that's a major disconnect.  We
>>> don't do anything like that for other PCI features, and I'm not aware
>>> of any restriction like that being documented.
>> With both of those policies in place, how did we get into
>> the situation of having configuration options and knobs?
> The kernel parameters and config options been there pretty much from
> the beginning.  We didn't have the per-device sysfs knobs until very
> recently.
Ah, I see.
>
>>>> I think the pragmatic way to approach it is to (essentially) apply
>>>> the policy as BIOS defaults and allow overrides from that.
>>> Do you mean that when enumerating a device (at boot-time or hot-add
>>> time), we would read the current ASPM config but not change it?  And
>>> users could use the sysfs knobs to enable/disable ASPM as desired?
>> Yes.
> Hot-added devices power up with ASPM disabled.  This policy would mean
> the user has to explicitly enable it, which doesn't seem practical to
> me.
Could we maybe have the hot added devices follow the policy of
the bridge they're connected to by default?
>
>>> That wouldn't solve the problem Kai-Heng is trying to solve.
>> Alone it wouldn't; but if you treated the i225 PCIe device
>> connected to the system as a "quirk" to apply ASPM policy
>> from the parent device to this child device it could.
> I want quirks for BROKEN devices.  Quirks for working hardware is a
> maintenance nightmare.
>
> Bjorn
If you follow my idea of hot added devices the policy follows
the parent would it work for the i225 PCIe device case?

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ