lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Tue, 23 Apr 2019 09:34:08 -0600
From:   Alex Williamson <alex.williamson@...hat.com>
To:     Alex G <mr.nuke.me@...il.com>
Cc:     bhelgaas@...gle.com, helgaas@...nel.org, linux-pci@...r.kernel.org,
        austin_bolen@...l.com, alex_gagniuc@...lteam.com,
        keith.busch@...el.com, Shyam_Iyer@...l.com, lukas@...ner.de,
        okaya@...nel.org, torvalds@...ux-foundation.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH] PCI/LINK: Account for BW notification in vector
 calculation

On Tue, 23 Apr 2019 09:33:53 -0500
Alex G <mr.nuke.me@...il.com> wrote:

> On 4/22/19 7:33 PM, Alex Williamson wrote:
> > On Mon, 22 Apr 2019 19:05:57 -0500
> > Alex G <mr.nuke.me@...il.com> wrote:  
> >> echo 0000:07:00.0:pcie010 |
> >> sudo tee /sys/bus/pci_express/drivers/pcie_bw_notification/unbind  
> > 
> > That's a bad solution for users, this is meaningless tracking of a
> > device whose driver is actively managing the link bandwidth for power
> > purposes.   
> 
> 0.5W savings on a 100+W GPU? I agree it's meaningless.

Evidence?  Regardless, I don't have control of the driver that's making
these changes, but the claim seems unfounded and irrelevant.
 
> > There is nothing wrong happening here that needs to fill
> > logs.  I thought maybe if I enabled notification of autonomous
> > bandwidth changes that it might categorize these as something we could
> > ignore, but it doesn't.
> > How can we identify only cases where this is
> > an erroneous/noteworthy situation?  Thanks,  
> 
> You don't. Ethernet doesn't. USB doesn't. This logging behavior is 
> consistent with every other subsystem that deals with multi-speed links. 
> I realize some people are very resistant to change (and use very ancient 
> kernels). I do not, however, agree that this is a sufficient argument to 
> dis-unify behavior.

Sorry, I don't see how any of this is relevant either.  Clearly I'm
using a recent kernel or I wouldn't be seeing this new bandwidth
notification driver.  I'm assigning a device to a VM whose driver is
power managing the device via link speed changes.  The result is that
we now see irrelevant spam in the host dmesg for every inconsequential
link downgrade directed by the device.  I can see why we might want to
be notified of degraded links due to signal issues, but what I'm
reporting is that there are also entirely normal and benign reasons
that a link might be reduced, we can't seem to tell the difference
between a fault and this normal dynamic scaling, and the assumption of
a fault is spamming dmesg.  So, I don't think what we have here is well
cooked.  Do drivers have a mechanism to opt-out of this error
reporting?  Can drivers register an anticipated link change to avoid
the spam?  What instructions can we *reasonably* give to users as to
when these messages mean something, when they don't, any how they can
be turned off?  Thanks,

Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ