[<prev] [next>] [thread-next>] [day] [month] [year] [list]
Message-ID: <20200115221008.GA191037@google.com>
Date: Wed, 15 Jan 2020 16:10:08 -0600
From: Bjorn Helgaas <helgaas@...nel.org>
To: Alexandru Gagniuc <mr.nuke.me@...il.com>,
Alexandru Gagniuc <alex_gagniuc@...lteam.com>,
Keith Busch <keith.busch@...el.com>
Cc: Jan Vesely <jano.vesely@...il.com>, Lukas Wunner <lukas@...ner.de>,
Alex Williamson <alex.williamson@...hat.com>,
Austin Bolen <austin_bolen@...l.com>,
Shyam Iyer <Shyam_Iyer@...l.com>,
Sinan Kaya <okaya@...nel.org>, linux-pci@...r.kernel.org,
linux-kernel@...r.kernel.org
Subject: Issues with "PCI/LINK: Report degraded links via link bandwidth
notification"
I think we have a problem with link bandwidth change notifications
(see https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/pcie/bw_notification.c).
Here's a recent bug report where Jan reported "_tons_" of these
notifications on an nvme device:
https://bugzilla.kernel.org/show_bug.cgi?id=206197
There was similar discussion involving GPU drivers at
https://lore.kernel.org/r/20190429185611.121751-2-helgaas@kernel.org
The current solution is the CONFIG_PCIE_BW config option, which
disables the messages completely. That option defaults to "off" (no
messages), but even so, I think it's a little problematic.
Users are not really in a position to figure out whether it's safe to
enable. All they can do is experiment and see whether it works with
their current mix of devices and drivers.
I don't think it's currently useful for distros because it's a
compile-time switch, and distros cannot predict what system configs
will be used, so I don't think they can enable it.
Does anybody have proposals for making it smarter about distinguishing
real problems from intentional power management, or maybe interfaces
drivers could use to tell us when we should ignore bandwidth changes?
Bjorn
Powered by blists - more mailing lists