lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <1380323960.27811.67.camel@pasglop>
Date:	Sat, 28 Sep 2013 09:19:20 +1000
From:	Benjamin Herrenschmidt <benh@...nel.crashing.org>
To:	Yinghai Lu <yinghai@...nel.org>
Cc:	Linus Torvalds <torvalds@...ux-foundation.org>,
	Bjorn Helgaas <bhelgaas@...gle.com>,
	"linux-pci@...r.kernel.org" <linux-pci@...r.kernel.org>,
	linuxppc-dev <linuxppc-dev@...ts.ozlabs.org>,
	Linux Kernel list <linux-kernel@...r.kernel.org>
Subject: Re: Please revert 928bea964827d7824b548c1f8e06eccbbc4d0d7d

On Fri, 2013-09-27 at 15:56 -0700, Yinghai Lu wrote:

> ok, please if you are ok attached one instead. It will print some warning about
> driver skipping pci_set_master, so we can catch more problem with drivers.

Except that the message is pretty cryptic :-) Especially since the
driver causing the message to be printed is not the one that did
the mistake in the first place, it's the next one coming up that
trips the warning.

In any case, the root cause is indeed the PCIe port driver:

We don't have ACPI, so pcie_port_platform_notify() isn't implemented,
and pcie_ports_auto is true, so we end up with capabilities set to 0.

Thus the port driver bails out before calling pci_set_master(). The fix
is to call pci_set_master() unconditionally. However that lead me to
find to a few interesting oddities in that port driver code:

 - If capabilities is 0, it returns after enabling the device and
doesn't disable it. But if it fails for any other reason later on (such
as failing to enable interrupts), it *will* disable the device. This is
inconsistent. In fact, if it had disabled the device as a result of the
0 capabilities, the problem would also not have happened (the subsequent
enable_bridge done by the e1000e driver would have done the right
thing). I've tested "fixing" that instead of moving the set_master call
and it fixes my problem as well.

 - In get_port_device_capability(), all capabilities are gated by a
combination of the bit in cap_mask and a corresponding HW check of
the presence of the relevant PCIe capability or similar... except
for the VC one, which is solely read from the HW capability. That means
that the platform pcie_port_platform_notify() has no ability to prevent
the VC capability (so if I have a broken bridge that advertises it but
my platform wants to block it, it can't).

 - I am quite nervous with the PCIe port driver disabling bridges. I
understand the intent but what if that bridge has some system device
behind it ? Something you don't even know about (used by ACPI, behind an
ISA bridge for example ?).

I think disabling bridges is a VERY risky proposition at all times
(including during PM) and I don't think the port driver should do it.

Maybe a more robust approach would be to detect one way or another that
the bridge was already enabled and only disable it if it wasn't or
something along those lines (ie ,restore it in the state it was)...

This is not my problem right now of course (in my case the bridge was
disabled to begin with) but I have a long experience with system stuff
hiding behind bridges and the code as it is written makes me a bit
nervous.

Cheers,
Ben.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ