lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <967fb44c-b1cd-875c-2354-b6ad0b8ae6d7@gmail.com>
Date:   Wed, 15 Jan 2020 20:44:21 -0600
From:   Alex G <mr.nuke.me@...il.com>
To:     Bjorn Helgaas <helgaas@...nel.org>,
        Alexandru Gagniuc <alex_gagniuc@...lteam.com>,
        Keith Busch <keith.busch@...el.com>
Cc:     Jan Vesely <jano.vesely@...il.com>, Lukas Wunner <lukas@...ner.de>,
        Alex Williamson <alex.williamson@...hat.com>,
        Austin Bolen <austin_bolen@...l.com>,
        Shyam Iyer <Shyam_Iyer@...l.com>,
        Sinan Kaya <okaya@...nel.org>, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org
Subject: Re: Issues with "PCI/LINK: Report degraded links via link bandwidth
 notification"

Hi Bjorn,

I'm no longer working on this, so my memory may not be up to speed. If 
the endpoint is causing the bandwidth change, then we should get an 
_autonomous_ link management interrupt instead. I don't think we report 
those, and that shouldn't spam the logs

If it's not a (non-autonomous) link management interrupt, then something 
is causing the downstream port to do funny things. I don't think ASPM is 
supposed to be causing this.

Do we know what's causing these swings?

For now, I suggest a boot-time parameter to disable link speed reporting 
instead of a compile time option.

Alex

On 1/15/20 4:10 PM, Bjorn Helgaas wrote:
> I think we have a problem with link bandwidth change notifications
> (see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
> 
> Here's a recent bug report where Jan reported "_tons_" of these
> notifications on an nvme device:
> https://bugzilla.kernel.org/show_bug.cgi?id=206197
> 
> There was similar discussion involving GPU drivers at
> https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
> 
> The current solution is the CONFIG_PCIE_BW config option, which
> disables the messages completely.  That option defaults to "off" (no
> messages), but even so, I think it's a little problematic.
> 
> Users are not really in a position to figure out whether it's safe to
> enable.  All they can do is experiment and see whether it works with
> their current mix of devices and drivers.
> 
> I don't think it's currently useful for distros because it's a
> compile-time switch, and distros cannot predict what system configs
> will be used, so I don't think they can enable it.
> 
> Does anybody have proposals for making it smarter about distinguishing
> real problems from intentional power management, or maybe interfaces
> drivers could use to tell us when we should ignore bandwidth changes?
> 
> Bjorn
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ