lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <eb289ebb-1d9f-a5e9-6c03-4ddd3d343547@oracle.com>
Date:   Tue, 20 Sep 2022 11:25:00 -0700
From:   Kin Cho <kin.cho@...cle.com>
To:     kan.liang@...ux.intel.com, peterz@...radead.org, mingo@...nel.org,
        acme@...hat.com, linux-kernel@...r.kernel.org
Cc:     alexander.shishkin@...ux.intel.com, jolsa@...hat.com,
        eranian@...gle.com, namhyung@...nel.org, ak@...ux.intel.com
Subject: Re: [PATCH V2 0/5] Uncore PMON discovery mechanism support

Hi Kan,

We're seeing the warning below from uncore_insert_box_info on SPR.
I added a debug print:

     /* Parsing Unit Discovery State */
     for (i = 0; i < global.max_units; i++) {
..
         uncore_insert_box_info(&unit, die, *parsed);
 >>        pr_info("%d 0x%llx\n", i, unit.ctl);

and here's the output:

[   17.758579] intel_uncore: 0 0x2fc0
[   17.763117] intel_uncore: 2 0x2010
..
[   17.935286] intel_uncore: 65 0x87e410a0
[   17.940308] intel_uncore: 66 0x87e21318
[   17.945331] ------------[ cut here ]------------
[   17.946305] WARNING: CPU: 65 PID: 1 at 
arch/x86/events/intel/uncore_discovery.c:184 
intel_uncore_has_discovery_tables+0x4c0/0x65c
..
[   18.161512] intel_uncore: 67 0x87e410a0
[   18.166533] intel_uncore: 68 0x87e21318
..

Any suggestions?

-kin

[   17.945331] ------------[ cut here ]------------
[   17.946305] WARNING: CPU: 65 PID: 1 at 
arch/x86/events/intel/uncore_discovery.c:184 
intel_uncore_has_discovery_tables+0x4c0/0x65c
[   17.946305] Modules linked in:
[   17.946305] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 
5.4.17-2136.313.1-X10-2c+ #4
[   17.946305] Hardware name: Oracle Corporation 
sca-x102c-107-sp/PCA,MB,X10-2c, BIOS 79805101 09/13/2022
[   17.946305] RIP: 0010:intel_uncore_has_discovery_tables+0x4c0/0x65c
[   17.946305] Code: 38 48 63 f0 48 8d 3c b1 45 8b 04 b0 44 89 07 4c 8b 
42 40 45 8b 04 b0 45 89 04 b1 0f b7 75 ca 3b 37 75 cf 4c 89 8d 68 ff ff 
ff <0f> 0b 48 89 cf e8 c6 4f 2b 00 4c 8b 8d 68 ff ff ff 4c 89 cf e8 b7
[   17.946305] RSP: 0000:ff4b04f60006bd08 EFLAGS: 00010246
[   17.946305] RAX: 0000000000000002 RBX: 0000000000000044 RCX: 
ff43a98a4ff1bb30
[   17.946305] RDX: ff43a98a4ff294e0 RSI: 0000000000000003 RDI: 
ff43a98a4ff1bb38
[   17.946305] RBP: ff4b04f60006bdb0 R08: 0000000000018000 R09: 
ff43a98a4ff1b310
[   17.946305] R10: 0000000000000005 R11: ff43a98c7f7fe000 R12: 
ff43a98777d66000
[   17.946305] R13: 0000000000015240 R14: ff4b04f61b286000 R15: 
0000000000000043
[   17.946305] FS:  0000000000000000(0000) GS:ff43a90a7f840000(0000) 
knlGS:0000000000000000
[   17.946305] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   17.946305] CR2: 0000000000000000 CR3: 000000967e00a001 CR4: 
0000000000761ee0
[   17.946305] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[   17.946305] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 
0000000000000400
[   17.946305] PKRU: 55555554
[   17.946305] Call Trace:
[   17.946305]  ? uncore_types_init+0x25f/0x25f
[   17.946305]  intel_uncore_init+0x64/0x50c
[   17.946305]  ? perf_pmu_register+0x2cc/0x403
[   17.946305]  ? uncore_types_init+0x25f/0x25f
[   17.946305]  do_one_initcall+0x52/0x1e1
[   17.946305]  ? trace_event_define_fields_initcall_level+0x2a/0x36
[   17.946305]  kernel_init_freeable+0x1fc/0x2a7
[   17.946305]  ? loglevel+0x5d/0x5d
[   17.946305]  ? rest_init+0xb0/0xb0
[   17.946305]  kernel_init+0xe/0x123
[   17.946305]  ret_from_fork+0x24/0x36
[   17.946305] ---[ end trace d9131e47b8a615f4 ]---


On 3/17/21 10:59 AM, kan.liang@...ux.intel.com wrote:
> From: Kan Liang <kan.liang@...ux.intel.com>
>
> Changes since V1:
> - Use the generic rbtree functions, rb_add() and rb_find(). (Patch 1)
> - Add a module parameter, uncore_no_discover. If users don't want the
>    discovery feature, they can set uncore_no_discover=true. (Patch 1)
>
>
> A mechanism of self-describing HW for the uncore PMOM has been
> introduced with the latest Intel platforms. By reading through an MMIO
> page worth of information, SW can ‘discover’ all the standard uncore
> PMON registers.
>
> With the discovery mechanism, Perf can
> - Retrieve the generic uncore unit information of all standard uncore
>    blocks, e.g., the address of counters, the address of the counter
>    control, the counter width, the access type, etc.
>    Perf can provide basic uncore support based on this information.
>    For a new platform, perf users will get basic uncore support even if
>    the platform-specific enabling code is not ready yet.
> - Retrieve accurate uncore unit information, e.g., the number of uncore
>    boxes. The number of uncore boxes may be different among machines.
>    Current perf hard code the max number of the uncore blocks. On some
>    machines, perf may create a PMU for an unavailable uncore block.
>    Although there is no harm (always return 0 for the unavailable uncore
>    block), it may confuse the users. The discovery mechanism can provide
>    the accurate number of available uncore boxes on a machine.
>
> But, the discovery mechanism has some limits,
> - Rely on BIOS's support. If a BIOS doesn't support the discovery
>    mechanism, the uncore driver will exit with -ENODEV. There is nothing
>    changed.
> - Only provide the generic uncore unit information. The information for
>    the advanced features, such as fixed counters, filters, and
>    constraints, cannot be retrieved.
> - Only support the standard PMON blocks. Non-standard PMON blocks, e.g.,
>    free-running counters, are not supported.
> - Only provide an ID for an uncore block. No meaningful name is
>    provided. The uncore_type_&typeID_&boxID will be used as the name.
> - Enabling the PCI and MMIO type of uncore blocks rely on the NUMA support.
>    These uncore blocks require the mapping information from a BUS to a
>    die. The current discovery table doesn't provide the mapping
>    information. The pcibus_to_node() from NUMA is used to retrieve the
>    information. If NUMA is not supported, some uncore blocks maybe
>    unavailable.
>
> To locate the MMIO page, SW has to find a PCI device with the unique
> capability ID 0x23 and retrieve its BAR address.
>
> The spec can be found at Snow Ridge or Ice Lake server's uncore document.
> https://cdrdv2.intel.com/v1/dl/getContent/611319
>
> Kan Liang (5):
>    perf/x86/intel/uncore: Parse uncore discovery tables
>    perf/x86/intel/uncore: Generic support for the MSR type of uncore
>      blocks
>    perf/x86/intel/uncore: Rename uncore_notifier to
>      uncore_pci_sub_notifier
>    perf/x86/intel/uncore: Generic support for the PCI type of uncore
>      blocks
>    perf/x86/intel/uncore: Generic support for the MMIO type of uncore
>      blocks
>
>   arch/x86/events/intel/Makefile           |   2 +-
>   arch/x86/events/intel/uncore.c           | 188 ++++++++--
>   arch/x86/events/intel/uncore.h           |  10 +-
>   arch/x86/events/intel/uncore_discovery.c | 622 +++++++++++++++++++++++++++++++
>   arch/x86/events/intel/uncore_discovery.h | 131 +++++++
>   5 files changed, 922 insertions(+), 31 deletions(-)
>   create mode 100644 arch/x86/events/intel/uncore_discovery.c
>   create mode 100644 arch/x86/events/intel/uncore_discovery.h
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ