linux-kernel - Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20220802160237.uqgez4f3ctm7rpyc@ldmartin-desk2>
Date:   Tue, 2 Aug 2022 12:02:37 -0400
From:   Lucas De Marchi <lucas.demarchi@...el.com>
To:     "Liang, Kan" <kan.liang@...ux.intel.com>
CC:     <peterz@...radead.org>, <mingo@...nel.org>, <acme@...hat.com>,
        <linux-kernel@...r.kernel.org>,
        <alexander.shishkin@...ux.intel.com>, <jolsa@...hat.com>,
        <eranian@...gle.com>, <namhyung@...nel.org>, <ak@...ux.intel.com>,
        <tilak.tangudu@...el.com>
Subject: Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery
 tables

On Tue, Aug 02, 2022 at 11:43:36AM -0400, Liang, Kan wrote:
>
>
>On 2022-08-02 10:22 a.m., Lucas De Marchi wrote:
>> On Mon, Jul 25, 2022 at 10:51:44AM -0400, Liang, Kan wrote:
>>>
>>>
>>> On 2022-07-23 2:56 p.m., Lucas De Marchi wrote:
>>>> On Fri, Jul 22, 2022 at 09:04:43AM -0400, Liang, Kan wrote:
>>>>>
>>>>>
>>>>> On 2022-07-22 8:55 a.m., Lucas De Marchi wrote:
>>>>>> Hi Kan,
>>>>>>
>>>>>> On Wed, Mar 17, 2021 at 10:59:33AM -0700, kan.liang@...ux.intel.com
>>>>>> wrote:
>>>>>>> From: Kan Liang <kan.liang@...ux.intel.com>
>>>>>>>
>>>>>>> A self-describing mechanism for the uncore PerfMon hardware has been
>>>>>>> introduced with the latest Intel platforms. By reading through an
>>>>>>> MMIO
>>>>>>> page worth of information, perf can 'discover' all the standard
>>>>>>> uncore
>>>>>>> PerfMon registers in a machine.
>>>>>>>
>>>>>>> The discovery mechanism relies on BIOS's support. With a proper BIOS,
>>>>>>> a PCI device with the unique capability ID 0x23 can be found on each
>>>>>>> die. Perf can retrieve the information of all available uncore
>>>>>>> PerfMons
>>>>>>> from the device via MMIO. The information is composed of one global
>>>>>>> discovery table and several unit discovery tables.
>>>>>>> - The global discovery table includes global uncore information of
>>>>>>> the
>>>>>>>  die, e.g., the address of the global control register, the offset of
>>>>>>>  the global status register, the number of uncore units, the
>>>>>>> offset of
>>>>>>>  unit discovery tables, etc.
>>>>>>> - The unit discovery table includes generic uncore unit information,
>>>>>>>  e.g., the access type, the counter width, the address of counters,
>>>>>>>  the address of the counter control, the unit ID, the unit type, etc.
>>>>>>>  The unit is also called "box" in the code.
>>>>>>> Perf can provide basic uncore support based on this information
>>>>>>> with the following patches.
>>>>>>>
>>>>>>> To locate the PCI device with the discovery tables, check the generic
>>>>>>> PCI ID first. If it doesn't match, go through the entire PCI device
>>>>>>> tree
>>>>>>> and locate the device with the unique capability ID.
>>>>>>>
>>>>>>> The uncore information is similar among dies. To save parsing time
>>>>>>> and
>>>>>>> space, only completely parse and store the discovery tables on the
>>>>>>> first
>>>>>>> die and the first box of each die. The parsed information is
>>>>>>> stored in
>>>>>>> an
>>>>>>> RB tree structure, intel_uncore_discovery_type. The size of the
>>>>>>> stored
>>>>>>> discovery tables varies among platforms. It's around 4KB for a
>>>>>>> Sapphire
>>>>>>> Rapids server.
>>>>>>>
>>>>>>> If a BIOS doesn't support the 'discovery' mechanism, the uncore
>>>>>>> driver
>>>>>>> will exit with -ENODEV. There is nothing changed.
>>>>>>>
>>>>>>> Add a module parameter to disable the discovery feature. If a BIOS
>>>>>>> gets
>>>>>>> the discovery tables wrong, users can have an option to disable the
>>>>>>> feature. For the current patchset, the uncore driver will exit with
>>>>>>> -ENODEV. In the future, it may fall back to the hardcode uncore
>>>>>>> driver
>>>>>>> on a known platform.
>>>>>>>
>>>>>>> Signed-off-by: Kan Liang <kan.liang@...ux.intel.com>
>>>>>>
>>>>>> I observed one issue when upgrading a kernel from 5.10 to 5.15 and
>>>>>> after
>>>>>> bisecting it arrived to this commit. I also verified the same issue is
>>>>>> present in 5.19-rc7 and that the issue is gone when booting with
>>>>>> intel_uncore.uncore_no_discover.
>>>>>>
>>>>>> Test system is a SPR host with a PVC gpu. Issue is that PVC is not
>>>>>> reaching pkg c6 state, even if we put it in rc6 state. It seems the
>>>>>> pcie
>>>>>> link is not idling, preventing it to go to pkg c6.
>>>>>>
>>>>>> PMON discovery in bios is set to "auto".
>>>>>>
>>>>>> We do see the following on dmesg while going through this code path:
>>>>>>
>>>>>>     intel_uncore: Invalid Global Discovery State: 0xffffffffffffffff
>>>>>> 0xffffffffffffffff 0xffffffffffffffff
>>>>>
>>>>> On SPR, the uncore driver relies on the discovery table provided by the
>>>>> BIOS/firmware. It looks like your BIOS/firmware is out of date. Could
>>>>> you please update to the latest BIOS/firmware and have a try?
>>>>
>>>> hum, the BIOS is up to date. It seems PVC itself has a 0x09a7 device
>>>> and it remains in D3, so the 0xffffffffffffffff we se below is
>>>> just the auto completion. No wonder the values don't match what we are
>>>> expecting here.
>>>>
>>>> Is it expected the device to be in D0? Or should we do anything here to
>>>> move it to D0 before doing these reads?
>>>>
>>>
>>> It's OK to have a 0x09a7 device. But the device should not claim to
>>> support the PMON Discovery if it doesn't comply the PMON discovery
>>> mechanism.
>>>
>>> See 1.10.1 Guidance on Finding PMON Discovery and Reading it in SPR
>>> uncore document. https://cdrdv2.intel.com/v1/dl/getContent/642245
>>> It demonstrates how the uncore driver find the device with the PMON
>>> discovery mechanism.
>>
>> ok, this is exactly the code in the kernel.
>>
>>>
>>> Simply speaking, the uncore driver looks for a DVSEC
>>> structure with an unique capability ID 0x23. Then it checks whether the
>>> PMON discovery entry (0x1) is supported. If both are detected, it means
>>> that the device comply the PMON discovery mechanism. The uncore driver
>>> will be enabled to parse the discovery table.
>>>
>>> AFAIK, the PVC gpu doesn't support the PMON discovery mechanism. I guess
>>> the firmwire of the PVC gpu mistakenly set the PMON discovery entry
>>> (0x1). You may want to check the extended capabilities (DVSEC) in the
>>> PCIe configuration space of the PVC gpu device.
>>
>> However here it seems we have 2 issues being mixed:
>>
>> 1) PVC with that capability when it shouldn't
>
>This is a firmware/HW issue. If PVC doesn't support the PMON discovery
>mechanism, the PVC and its attached OOBMSM device should not enumerate
>the discovery mechanism. However, the PVC enumerates the discovery
>mechanism here, which doesn't comply the spec.
>
>The uncore driver prints errors when the in-compliance is detected.
>That's expected. There is noting more SW can do here.
>
>The firmware issue must be fixed.

yes, that's what I said. It's exposing the capability when it shouldn't.
That's being worked on from the firmware side already.

>
>> 2) Trying to read the MMIOs when device is possibly in D3 state:
>
>The uncore driver skips the device which doesn't support the discovery
>mechanism.
>If 1) is fixed, the uncore driver will not touch the MMIO space of a PVC
>device. The power issue should be gone.
>
>I've already sent you a patch to ignore the PVC added OOBMSM device, you
>can double check with the patch.

(2) is a more generic issue that I'm mentioning. Forget for a moment we
are talking about PVC - that will be fixed by (1). We are trying to read
the mmio from a device that can be in D3, either because it started in
D3 or because a driver, loaded before intel_uncore, moved it to that
state. That won't work even if the device supports the discovery
mechanism.

Lucas De Marchi

>
>Thanks,
>Kan
>
>>
>>     /* Map whole discovery table */
>>     addr = pci_dword & ~(PAGE_SIZE - 1);
>>     io_addr = ioremap(addr, UNCORE_DISCOVERY_MAP_SIZE);
>>
>>     /* Read Global Discovery table */
>>     memcpy_fromio(&global, io_addr, sizeof(struct
>> uncore_global_discovery));
>>
>> Unless it's guaranteed that at this point the device must be in D0
>> state, this doesn't look right.  When we are binding a driver to a PCI
>> device, pci core will move it to D0 for us:
>>
>>     static long local_pci_probe(void *_ddi)
>>     {
>>         ...
>>         /*
>>          * Unbound PCI devices are always put in D0, regardless of
>>          * runtime PM status.  During probe, the device is set to
>>          * active and the usage count is incremented.  If the driver
>>          * supports runtime PM, it should call pm_runtime_put_noidle(),
>>          * or any other runtime PM helper function decrementing the usage
>>          * count, in its probe routine and pm_runtime_get_noresume() in
>>          * its remove routine.
>>          */
>>          pm_runtime_get_sync(dev);
>>          ...
>>
>> But here we are traversing the entire PCI device tree by ourselves.
>> Considering intel_uncore is a module that can be loaded at any time
>> (even after the driver supporting PVC, which already called
>> pm_runtime_put_noidle(), it looks like we are missing the pm integration
>> here.
>>
>> On a quick hack, just forcing the device into D0 before doing the MMIO,
>> the PM issue is gone (but we still hit the problem of PVC having the cap
>> when it shouldn't)
>>
>> thanks
>> Lucas De Marchi
>>
>>>
>>> Thanks,
>>> Kan