lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <CAA1RLiRkAqp+qXoi7-e6NbRoHoGLmV5F9R5wbR1xcsy8O0SWKQ@mail.gmail.com>
Date:   Wed, 28 Sep 2022 14:49:47 +0800
From:   Xiaochun Li <xiaocli@...hat.com>
To:     Bjorn Helgaas <helgaas@...nel.org>
Cc:     Xiaochun Lee <lixiaochun.2888@....com>,
        nirmal.patel@...ux.intel.com, jonathan.derrick@...ux.dev,
        lpieralisi@...nel.org, robh@...nel.org, kw@...ux.com,
        bhelgaas@...gle.com, linux-pci@...r.kernel.org,
        linux-kernel@...r.kernel.org, lkp@...el.com,
        Xiaochun Lee <lixc17@...ovo.com>
Subject: Re: [PATCH v2 1/1] PCI: Set no io resource for bridges that behind
 VMD controller

On Wed, Sep 28, 2022 at 2:57 AM Bjorn Helgaas <helgaas@...nel.org> wrote:
>
> On Tue, Sep 27, 2022 at 10:16:06PM +0800, Xiaochun Lee wrote:
> > From: Xiaochun Lee <lixc17@...ovo.com>
> >
> > When enable VMDs on Intel CPUs, VMD controllers(8086:28c0) be
> > recognized by VMD driver and there are many failed messages of
> > BAR 13 when scan the bridges and assign IO resource behind it
> > as listed below, the bridge wants to get 0x6000 as its IO
> > resource, but there is no IO resources on the host bridge.
> >
> > VMD host bridge resources:
> > vmd 0000:e2:00.5: PCI host bridge to bus 10001:00
> > pci_bus 10001:00: root bus resource [bus 00-1f]
> > pci_bus 10001:00: root bus resource [mem 0xf4000000-0xf5ffffff]
> > pci_bus 10001:00: root bus resource [mem 0x29ffff02010-0x29fffffffff 64bit]
> > pci_bus 10001:00: scanning bus
> >
> > Read bridge IO resource:
> > pci 10001:00:02.0: PCI bridge to [bus 01]
> > pci 10001:00:02.0:   bridge window [io  0x1000-0x6fff]
> > pci 10001:00:03.0: PCI bridge to [bus 02]
> > pci 10001:00:03.0:   bridge window [io  0x1000-0x6fff]
> >
> > Failed messages of BAR#13:
> > pci 10001:00:02.0: BAR 13: no space for [io  size 0x6000]
> > pci 10001:00:02.0: BAR 13: failed to assign [io  size 0x6000]
> > pci 10001:00:03.0: BAR 13: no space for [io  size 0x6000]
> > pci 10001:00:03.0: BAR 13: failed to assign [io  size 0x6000]
> >
> > VMD-enabled root ports use
> > Enhanced Configuration Access Mechanism (ECAM) access
> > PCI Express configuration space, and offer VMD_CFGBAR as
> > base of PCI Express configuration space for the bridges
> > behind it. The configuration space includes IO resources,
>
> I don't quite understand this part.  ECAM is an MMIO method, so it
> uses memory space, not I/O port space.

VMD read PCI IO base has a special callback function "vmd_pci_read",
The call stack list as below:

pci_read_bridge_windows()
        pci_read_config_word(bridge, PCI_IO_BASE, &io) # read BAR 13
                pci_bus_read_config_word()
                        bus->ops->read()
                                vmd_pci_read() # for VMD bus
                                        vmd_cfg_addr()
                                                offset =
PCIE_ECAM_OFFSET(busnr_ecam, devfn, reg);
                                                return vmd->cfgbar +
offset; #base vmd->cfgbar
                                        readw(addr) # read value out

And vmd->cfgbar points to  an iomap for VMD_CFGBAR as listed below.
vmd->cfgbar = pcim_iomap(dev, VMD_CFGBAR, 0);

Intel spec Sapphire Rapids Server Processor,EDS volume two,Revision 0.9
Chapter 9 Volume Management Device (VMD) Registers
Section 9.1.9 VMD Configuration Base Address show CFGBAR is offset 0x10.

dmesg list as following:
[    2.091800] pci 0000:64:00.5: [8086:28c0] type 00 class 0x010400
[    2.091810] pci 0000:64:00.5: reg 0x10: [mem
0x24ffc000000-0x24ffdffffff 64bit pref]
[    2.091816] pci 0000:64:00.5: reg 0x18: [mem 0xe0000000-0xe1ffffff]
[    2.091827] pci 0000:64:00.5: reg 0x20: [mem
0x24ffff00000-0x24fffffffff 64bit]

Commad "lspci -vvv -s 64:00.5" shown as below:
0000:64:00.5 RAID bus controller: Intel Corporation Volume Management
Device NVMe RAID Controller (rev 04)
        DeviceName: Intel RAID Controller CPU1
        Subsystem: Lenovo Device 010f
        Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR- FastB2B- DisINTx-
        Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
<TAbort- <MAbort- >SERR- <PERR- INTx-
        Latency: 0, Cache Line Size: 64 bytes
        NUMA node: 0
        Region 0: Memory at 24ffc000000 (64-bit, prefetchable) [size=32M]
        Region 2: Memory at e0000000 (32-bit, non-prefetchable) [size=32M]
        Region 4: Memory at 24ffff00000 (64-bit, non-prefetchable) [size=1M]

So for VMD, actually the bridges configuration IO base and IO limit of
BAR#13 are read from VMD host bridge BAR0(offset 0x10), not like the normal
bridges on x86 read Configuration via IO port(0cf8-0cff : PCI conf1)

And refer spec PCI Express® Base Specification Revision 5.0 Version 1.0
Section 7.2 PCI Express Configuration Mechanisms
subsection 7.2.2 PCI Express Enhanced Configuration Access Mechanism (ECAM)
has a description as below.
The ECAM utilizes a flat memory-mapped address space to access device
configuration registers. In this case, the memory address determines the
configuration register accessed and the memory data updates (for a write)
or returns the contents of (for a read) the addressed register.

>
> > but these IO resources are not actually used on X86,
> > it can result in BAR#13 assign IO resource failed.
> > Therefor,clear IO resources by setting an IO base value
> > greater than limit to these bridges here, so in this case,
> > we can leverage kernel parameter "pci=hpiosize=0KB" to
> > avoid this failed messages show up.
>
> Is the only purpose of this patch to avoid the "no space" and "failed
> to assign" messages?  Or is there something that actually doesn't
> work?

Yes, only to avoid the failed messages showing up, we don't find any function
impact so far. Our customers extremely care about these failed messages,
They keep complaining about it. For that non-vmd hot plug bridges,
append kernel parameter "pci=hpiosize=0KB" can avoid them show up, but for
bridges behind VMD, it couldn't work, there isn't any workaround so far.

>
> I don't think it's worth making a quirk just to avoid the message.
> There are several systems where I/O port space is not available, and
> it *would* be nice to have a generic approach that handles that
> better.

The bridge IO base and limit of BAR#13 read from their host bridge
BAR0(offset 0x10),it might be an uncorrected value set by the VROC driver in
UEFI.Hence, it seems the firmware could fix it. But it is really hard to push
them to enhance it, so I prepare a quirk for the corresponding bridges here.
If making a generic approach to avoid the failed message for assigning IO
resource of BAR#13, where do you prefer adding it?
Do you have any recommendations? Thanks!
And would you please share with me which systems have an unavailable
IO port space too, appreciate it.

>
> > Signed-off-by: Xiaochun Lee <lixc17@...ovo.com>
> > ---
> >  drivers/pci/quirks.c | 60 ++++++++++++++++++++++++++++++++++++++++++++
> >  1 file changed, 60 insertions(+)
> >
> > diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
> > index 4944798e75b5..efecf12e8059 100644
> > --- a/drivers/pci/quirks.c
> > +++ b/drivers/pci/quirks.c
> > @@ -5956,3 +5956,63 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56b1, aspm_l1_acceptable_latency
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56c0, aspm_l1_acceptable_latency);
> >  DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x56c1, aspm_l1_acceptable_latency);
> >  #endif
> > +
> > +#if defined(CONFIG_X86_64) || defined(CONFIG_X86)
> > +
> > +#ifdef CONFIG_UML_X86
> > +#define is_vmd(bus)             false
> > +#endif /* CONFIG_UML_X86 */
> > +
> > +/*
> > + * VMD-enabled root ports use Enhanced Configuration Access Mechanism (ECAM)
> > + * access PCI Express configuration space, and offer VMD_CFGBAR as the
> > + * base of PCI Express configuration space for the bridges behind it.
> > + * The configuration space includes IO resources, but these IO
> > + * resources are not actually used on X86, and it can result
> > + * in BAR#13 assign IO resource failed. Therefor, clear IO resources
> > + * by setting an IO base value greater than limit to these bridges here,
> > + * so in this case, append kernel parameter "pci=hpiosize=0KB" can avoid
> > + * the BAR#13 failed messages show up.
> > + */
> > +static void quirk_vmd_no_iosize(struct pci_dev *bridge)
> > +{
> > +     u8 io_base_lo, io_limit_lo;
> > +     u16 io_low;
> > +     u32 io_upper16;
> > +     unsigned long io_mask,  base, limit;
> > +
> > +     io_mask = PCI_IO_RANGE_MASK;
> > +     if (bridge->io_window_1k)
> > +             io_mask = PCI_IO_1K_RANGE_MASK;
> > +
> > +     /* VMD Domain */
> > +     if (is_vmd(bridge->bus) && bridge->is_hotplug_bridge) {
> > +             pci_read_config_byte(bridge, PCI_IO_BASE, &io_base_lo);
> > +             pci_read_config_byte(bridge, PCI_IO_LIMIT, &io_limit_lo);
> > +             base = (io_base_lo & io_mask) << 8;
> > +             limit = (io_limit_lo & io_mask) << 8;
> > +             /* if there are defined io ports behind the bridge on x86,
> > +              * we clear it, since there is only 64KB IO resource on it,
> > +              * beyond that, hotplug io bridges don't needs IO port resource,
> > +              * such as NVMes attach on it. So the corresponding range must be
> > +              * turned off by writing base value greater than limit to the
> > +              * bridge's base/limit registers.
> > +              */
> > +             if (limit >= base) {
> > +                     /* Clear upper 16 bits of I/O base/limit */
> > +                     io_upper16 = 0;
> > +                     /* set base value greater than limit */
> > +                     io_low = 0x00f0;
> > +
> > +                     /* Temporarily disable the I/O range before updating PCI_IO_BASE */
> > +                     pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, 0x0000ffff);
> > +                     /* Update lower 16 bits of I/O base/limit */
> > +                     pci_write_config_word(bridge, PCI_IO_BASE, io_low);
> > +                     /* Update upper 16 bits of I/O base/limit */
> > +                     pci_write_config_dword(bridge, PCI_IO_BASE_UPPER16, io_upper16);
> > +             }
> > +     }
> > +}
> > +DECLARE_PCI_FIXUP_CLASS_HEADER(PCI_ANY_ID, PCI_ANY_ID,
> > +             PCI_CLASS_BRIDGE_PCI, 8, quirk_vmd_no_iosize);
> > +#endif
> > --
> > 2.37.3
> >
>


-- 
Best Regards,
Xiaochun Li (Xavier) (Mr.)
RedHat Onsite engineer
Lenovo Linux Kernel Engineer
Lenovo Infrastructure Solutions Group (ISG)/China Standard Time Zone(CST)

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ