lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220120190826.wkhkcx53lmafq2yp@pali>
Date:   Thu, 20 Jan 2022 20:08:26 +0100
From:   Pali Rohár <pali@...nel.org>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Thomas Petazzoni <thomas.petazzoni@...tlin.com>,
        Lorenzo Pieralisi <lorenzo.pieralisi@....com>,
        Rob Herring <robh@...nel.org>,
        Krzysztof Wilczyński <kw@...ux.com>,
        Bjorn Helgaas <bhelgaas@...gle.com>,
        Marek Behún <kabel@...nel.org>,
        linux-pci@...r.kernel.org, linux-arm-kernel@...ts.infradead.org,
        linux-kernel@...r.kernel.org
Subject: Re: [PATCH 08/15] PCI: mvebu: Propagate errors when updating
 PCI_IO_BASE and PCI_MEM_BASE registers

On Thursday 20 January 2022 11:50:47 Bjorn Helgaas wrote:
> On Thu, Jan 13, 2022 at 11:35:23AM +0100, Pali Rohár wrote:
> > On Wednesday 12 January 2022 18:19:21 Bjorn Helgaas wrote:
> > > On Sat, Jan 08, 2022 at 12:46:58AM +0100, Pali Rohár wrote:
> > > > On Friday 07 January 2022 17:16:17 Bjorn Helgaas wrote:
> > > > > On Fri, Jan 07, 2022 at 11:28:26PM +0100, Pali Rohár wrote:
> > > > > > On Friday 07 January 2022 15:55:04 Bjorn Helgaas wrote:
> > > > > > > On Thu, Nov 25, 2021 at 01:45:58PM +0100, Pali Rohár wrote:
> > > > > > > > Properly propagate failure from mvebu_pcie_add_windows()
> > > > > > > > function back to the caller
> > > > > > > > mvebu_pci_bridge_emul_base_conf_write() and correctly
> > > > > > > > updates PCI_IO_BASE, PCI_MEM_BASE and
> > > > > > > > PCI_IO_BASE_UPPER16 registers on error.  On error set
> > > > > > > > base value higher than limit value which indicates that
> > > > > > > > address range is disabled. 
> > > > > > > 
> > > > > > > Does the spec say that if software programs something
> > > > > > > invalid, hardware should proactively set the base and
> > > > > > > limit registers to disable the window?
> > > > > > 
> > > > > > No. But this patch address something totally different.
> > > > > > Software can do fully valid operation, e.g. try to set
> > > > > > forwarding memory window as large as possible. But because
> > > > > > this driver "emulates" pci bridge by calling software/kernel
> > > > > > function (mvebu_pcie_add_windows), some operations which in
> > > > > > real HW cannot happen, are possible in software.
> > > > > > 
> > > > > > For example there are limitations in sizes of forwarding
> > > > > > memory windows, because it is done by mvebu-mbus driver,
> > > > > > which is responsible for configuring mapping and forwarding
> > > > > > of PCIe I/O and MEM windows. And due to Marvell HW, there
> > > > > > are restrictions which are not in PCIe HW.
> > > > > > 
> > > > > > Currently if such error happens, obviously kernel is not
> > > > > > able to set PCIe windows and it just print warnings to
> > > > > > dmesg. Trying to access these windows would result in the
> > > > > > worst case in crashes.
> > > > > > 
> > > > > > With this change when mvebu_pcie_add_windows() function
> > > > > > fails then into emulated config space is put information
> > > > > > that particular forwarding window is disabled. I think that
> > > > > > it is better to indicate it in config space what is the
> > > > > > current "reality" of hardware configuration. If window is
> > > > > > disabled in real-HW (meaning in mvebu-mbus driver) then show
> > > > > > it also in emulated config space of pci bridge.
> > > > > > 
> > > > > > Do you have better idea what should emulated pci bridge do,
> > > > > > if software try to set fully valid configuration of
> > > > > > forwarding window, but it is not possible to achieve it
> > > > > > (even compliant PCI bridge must be able to do it)?
> > > > > 
> > > > > On an ACPI system, the host bridge window sizes are
> > > > > constrained by the host bridge _CRS method.  I assume there's
> > > > > a similar constraint in DT.
> > > > > 
> > > > > Is the fact that mvebu_pcie_add_windows() can fail a symptom
> > > > > of a DT that describes more available space than mvebu-bus can
> > > > > map?
> > > > 
> > > > Memory maps for mvebu are more complicated. There is no explicit
> > > > size in DT ranges property as it is dynamically allocated by
> > > > mvebu-mbus:
> > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/armada-385.dtsi?h=v5.15#n47
> > > 
> > > I wish I knew how to really interpret those "ranges" properties.
> > > (Is there a good description in Documentation/ somewhere?  All
> > > I've found so far is https://elinux.org/Device_Tree_Usage, which
> > > is good, but doesn't match this example completely.)
> > > 
> > > I see:
> > > 
> > >   pciec: pcie {
> > >     ranges = <...>;
> > >     pcie1: pcie@1,0 {
> > >       ranges = <0x82000000 0 0 0x82000000 0x1 0 1 0
> > > 	        0x81000000 0 0 0x81000000 0x1 0 1 0>;
> > >     };
> > >     pcie2: pcie@2,0 {
> > >       ranges = <0x82000000 0 0 0x82000000 0x2 0 1 0
> > > 	        0x81000000 0 0 0x81000000 0x2 0 1 0>;
> > >     };
> > >     pcie3: pcie@3,0 {
> > >       ranges = <0x82000000 0 0 0x82000000 0x3 0 1 0
> > > 	        0x81000000 0 0 0x81000000 0x3 0 1 0>;
> > >     };
> > >     pcie4: pcie@4,0 {
> > >       ranges = <0x82000000 0 0 0x82000000 0x4 0 1 0
> > > 	        0x81000000 0 0 0x81000000 0x4 0 1 0>;
> > >     };
> > >   };
> > > 
> > > What does this look like in dmesg, i.e., what CPU address ranges are
> > > mapped to what PCI bus addresses?
> > 
> > These explicit ranges in DT are probably ignored as they are invalid.
> > You can see them (0xffffffffffffffff) in dmesg. 
> 
> Are you saying that this DT ranges and the dmesg line are connected?
> 
>   ranges = <0x82000000 0 0 0x82000000 0x1 0 1 0
>             0x81000000 0 0 0x81000000 0x1 0 1 0>;
> 
>   mvebu-pcie soc:pcie: MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
> 
> 1) It would be nice if there were a hint somewhere in Documentation/
> that would allow mere mortals to see the connection there.

I agree, that there is missing lot of documentation related to Marvell
PCIe controllers, both pci-mvebu.c and pci-aardvark.c. The main problem
now is that it is hard to find information which explain everything...
like mysterious Memory Controller (which is now explained):
https://lore.kernel.org/linux-pci/20211003120944.3lmwxylnhlp2kfj7@pali/

> 2) Why do we have these DT entries if they are invalid and useless?

Sorry, I do not know. I was not involved during introducing of DT files
for this platform. I'm just observing how it works...

> > MEM and I/O resources are parsed in pci-mvebu.c driver in
> > mvebu_pcie_parse_request_resources() function.
> 
> So mvebu-mbus.c fills in the static mbus_state from the DT
> "pcie-mem-aperture", which seems unconnected to the DT descriptions of
> the PCI controllers:
> 
>   static struct mvebu_mbus_state mbus_state;
> 
>   mvebu_mbus_dt_init
>     mvebu_mbus_get_pcie_resources(&mbus_state.pcie_mem_aperture)
>       of_property_read_u32_array("pcie-mem-aperture")
> 
>   mvebu_pcie_probe
>     mvebu_pcie_parse_request_resources
>       mvebu_mbus_get_pcie_mem_aperture(&pcie->mem)
> 	*res = mbus_state.pcie_mem_aperture
>       pci_add_resource(&bridge->windows, &pcie->mem)
> 
> > Here is relevant dmesg output:
> > 
> > mvebu-pcie soc:pcie: host bridge /soc/pcie ranges:
> > mvebu-pcie soc:pcie:      MEM 0x00f1080000..0x00f1081fff -> 0x0000080000
> > mvebu-pcie soc:pcie:      MEM 0x00f1040000..0x00f1041fff -> 0x0000040000
> > mvebu-pcie soc:pcie:      MEM 0x00f1044000..0x00f1045fff -> 0x0000044000
> > mvebu-pcie soc:pcie:      MEM 0x00f1048000..0x00f1049fff -> 0x0000048000
> > mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
> > mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0100000000
> > mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
> > mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0200000000
> > mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
> > mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0300000000
> > mvebu-pcie soc:pcie:      MEM 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
> > mvebu-pcie soc:pcie:       IO 0xffffffffffffffff..0x00fffffffe -> 0x0400000000
> > mvebu-pcie soc:pcie: PCI host bridge to bus 0000:00
> > pci_bus 0000:00: root bus resource [bus 00-ff]
> > pci_bus 0000:00: root bus resource [mem 0xf1080000-0xf1081fff] (bus address [0x00080000-0x00081fff])
> > pci_bus 0000:00: root bus resource [mem 0xf1040000-0xf1041fff] (bus address [0x00040000-0x00041fff])
> > pci_bus 0000:00: root bus resource [mem 0xf1044000-0xf1045fff] (bus address [0x00044000-0x00045fff])
> > pci_bus 0000:00: root bus resource [mem 0xf1048000-0xf1049fff] (bus address [0x00048000-0x00049fff])
> > pci_bus 0000:00: root bus resource [mem 0xe0000000-0xe7ffffff]
> 
> I see 0xf1080000-0xf1081fff, 0xf1040000-0xf1041fff, etc mentioned in
> the DT info above, but I don't see where [mem 0xe0000000-0xe7ffffff]
> came from.

0xe0000000 is defined in above mentioned "pcie-mem-aperture" DT property
and it is in armada-38x.dtsi DTS file included from armada-385.dtsi:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/arm/boot/dts/armada-38x.dtsi?h=v5.15#n42

> Regardless, this means PCI thinks [mem 0xe0000000-0xe7ffffff] is
> available on bus 00 and can be assigned to devices on bus 00 according
> to the normal PCI rules (BARs aligned on size, PCI bridge windows
> aligned on 1MB and multiple of 1MB in size).  IIUC, mvebu imposes
> additional alignment constraints on the bridge windows.
> 
> These are the bridge window assignments from your dmesg:
> 
> > pci 0000:00:01.0: BAR 8: assigned [mem 0xe0000000-0xe00fffff]
> > pci 0000:00:02.0: BAR 8: assigned [mem 0xe0200000-0xe04fffff]
> > pci 0000:00:03.0: BAR 8: assigned [mem 0xe0100000-0xe01fffff]
> 
> > pci 0000:00:01.0: PCI bridge to [bus 01]
> > pci 0000:00:01.0:   bridge window [mem 0xe0000000-0xe00fffff]
> > pci 0000:00:02.0: PCI bridge to [bus 02]
> > pci 0000:00:02.0:   bridge window [mem 0xe0200000-0xe04fffff]
> > pci 0000:00:03.0: PCI bridge to [bus 03]
> > pci 0000:00:03.0:   bridge window [mem 0xe0100000-0xe01fffff]
> 
> The PCI core knows nothing about the mvebu constraints.  Are we just
> lucky here that when PCI assigned these bridge windows, they happen to
> be supported on mvebu?  What happens if PCI decides it needs 29MB on
> bus 01?

In this case pci-mvebu.c split 29MB window into continuous ranges of
power of two (16MB + 8MB + 4MB + 1MB) and then register each range to
mbus slot. Code is in function mvebu_pcie_add_windows():
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/pci/controller/pci-mvebu.c?h=v5.15#n300

So at the end there is continuous space of 29MB PCIe window, just it
"eats" 4 mbus slots.

This function may fail (if there is not enough free mbus slots) and this
patch is propagating that failure back to the caller.

> > > Are pcie1, pcie2, etc Root Ports?  Or are they each separate host
> > > bridges (they each have "bus-range = <0x00 0xff>")?
> > 
> > From kernel point of view they are root ports. But in reality every of
> > these root port is on separate bus segment, but kernel pci-mvebu.c
> > driver merges all these segments/domains into one host bridge and put
> > all root ports into bus 0.
> > 
> > Here is lspci -tvnn output with topology:
> > 
> > $ lspci -tvnn
> > -[0000:00]-+-01.0-[01]----00.0  Device [1e0f:0001]
> >            +-02.0-[02]----00.0  Qualcomm Atheros QCA986x/988x 802.11ac Wireless Network Adapter [168c:003c]
> >            \-03.0-[03]----00.0  Qualcomm Atheros AR9287 Wireless Network Adapter (PCI-Express) [168c:002e]
> 
> > Buses 1, 2 and 3 represents mPCIe cards, all of them are in reality
> > in separate bus segments and on different HW host bridges. So they
> > do *not* share access to config space, do *not* share INTx
> > interrupts, etc...
> > 
> > > Is space from pciec dynamically assigned to pcie1, pcie2, etc?  If
> > > so, I assume there are more restrictions on the size and alignment
> > > than on PCI bridge windows, which allow size/alignment down to
> > > 1MB?
> > 
> > Yes, exactly. I do not know now all restrictions. At least there are
> > fixed number of memory slots and each has to be of size 2^N. They
> > are dynamically assigned by kernel mbus driver at time when somebody
> > updates BASE/LIMIT registers. And that kernel mbus driver takes care
> > to split non-aligned window size to more slots of size 2^N. And
> > resources are shared from pool with other HW parts (e.g. DMA), so
> > other drivers loaded in kernel can "eat" available slots before
> > pci-mvebu and then there does not have to be nothing to allocate for
> > PCI.
> 
> So IIUC,
> 
>   pcie1 == 00:01.0 Root Port
>   pcie2 == 00:02.0 Root Port
>   pcie3 == 00:03.0 Root Port
> 
> From a software point of view, they're all under a single host bridge,
> and Linux assumes everything under a host bridge plays by the PCI
> rules.

Yes.

> In this case, the root ports *don't* play by the rules since they have
> additional alignment restrictions, so I think these really should be
> described as separate host bridges in DT with the address space
> carved up statically among them.

I fully agree with you.

But pci-mvebu.c driver and also its DT bindings are written differently.
Changing it probably would not be simple due to backward compatibility
and will take development resources...

> It's common on x86 to have multiple host bridges that all appear to
> software to be in domain 0000.  The bus number ranges under each are
> static, e.g., one bridge has [bus 00-7f] and another has [bus 80-ff].

For mvebu they are dynamic and kernel assigns them at boot. As my above
printed lspci topology is simple, first bridge has assigned [bus 01-01],
second bridge [bus 02-02] and third bridge [bus 03-03].

> > But most Armada boards do not have exported all peripherals from SoC,
> > unconnected are disabled in DT and therefore exhaustion should not
> > happen.
> > 
> > > I'm trying to see how this could be described in ACPI because that's a
> > > fairly general model that accommodates most machines.  Possibly
> > > describing mvebu in ACPI would involve losing some flexibility.
> > 
> > I do not understand APCI model very well and I'm in impression that it
> > is impossible to represent mvebu in ACPI.
> 
> It could be described as a separate host bridge for every root port.
> ACPI uses _CRS (current resource settings) to describe the apertures
> to PCI and any address translation.  Currently the _CRS description is
> static, but ACPI does allow those resource assignments to be modified
> via _PRS (possible resource settings) and _SRS (set resource
> settings).
> 
> Bjorn

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ