lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAAd53p4kH7E92++jabBhvsM_+M7Dpyk2JP+aoVdb_sxZn47eyQ@mail.gmail.com>
Date:   Tue, 27 Jun 2023 16:35:25 +0800
From:   Kai-Heng Feng <kai.heng.feng@...onical.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     "Limonciello, Mario" <mario.limonciello@....com>,
        bhelgaas@...gle.com,
        Mika Westerberg <mika.westerberg@...ux.intel.com>,
        Kuppuswamy Sathyanarayanan 
        <sathyanarayanan.kuppuswamy@...ux.intel.com>,
        Vidya Sagar <vidyas@...dia.com>,
        Michael Bottini <michael.a.bottini@...ux.intel.com>,
        "Rafael J. Wysocki" <rafael.j.wysocki@...el.com>,
        linux-pci@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/ASPM: Enable ASPM on external PCIe devices

On Fri, Jun 23, 2023 at 7:06 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Tue, Jun 20, 2023 at 01:36:59PM -0500, Limonciello, Mario wrote:
> > <snip>
> > > > A variety of Intel chipsets don't support lane width switching
> > > > or speed switching.  When ASPM has been enabled on a dGPU,
> > > > these features are utilized and breakage ensues.
> > > Maybe this helps explain all the completely unmaintainable ASPM
> > > garbage in amdgpu and radeon.
> > >
> > > If these devices are broken, we need quirks for them.
> >
> > The problem is which device do you consider "broken"?
> > The dGPU that uses these features when the platform advertises ASPM
> > or the chipset which doesn't support the features that the device
> > uses when ASPM is active?
> >
> > With this problem I'm talking about the dGPU works fine on hosts
> > that support these features.
>
> Without more details about what's broken and when, I can't say.  What
> I *think* is that a device that doesn't work per spec needs a quirk.
> Typically it's a device that advertises a capability that doesn't work
> correctly.

Many silicon vendors use the same "PCI IP" and integrate it into their
own chip. That's one of the reason why the capability doesn't really
reflect what actually being support.
The vendors then send their private datasheet to system integrator so
BIOS can enable what's really supported.

So the logic is to ignore the capability and trust the default set by BIOS.

>
> > > > > > I think the pragmatic way to approach it is to (essentially)
> > > > > > apply the policy as BIOS defaults and allow overrides from
> > > > > > that.
> > > > >
> > > > > Do you mean that when enumerating a device (at boot-time or
> > > > > hot-add time), we would read the current ASPM config but not
> > > > > change it?  And users could use the sysfs knobs to
> > > > > enable/disable ASPM as desired?
> > > >
> > > > Yes.
> > > >
> > > Hot-added devices power up with ASPM disabled.  This policy would
> > > mean the user has to explicitly enable it, which doesn't seem
> > > practical to me.
> >
> > Could we maybe have the hot added devices follow the policy of
> > the bridge they're connected to by default?
> >
> > > > > That wouldn't solve the problem Kai-Heng is trying to solve.
> > > >
> > > > Alone it wouldn't; but if you treated the i225 PCIe device
> > > > connected to the system as a "quirk" to apply ASPM policy
> > > > from the parent device to this child device it could.
> > >
> > > I want quirks for BROKEN devices.  Quirks for working hardware is a
> > > maintenance nightmare.
> >
> > If you follow my idea of hot added devices the policy follows
> > the parent would it work for the i225 PCIe device case?
>
> That doesn't *sound* really robust to me because even if the default
> config after hot-add works, the user can change things via sysfs, and
> any configuration we set it to should work as well.  If there are
> land-mines there, we need a quirk that prevents sysfs from running
> into it.

For this case it means driver needs to provide a ASPM callback to flip
PTM based on ASPM sysfs.

Kai-Heng

> Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ