linux-kernel - Re: [PATCH] perf/x86/intel: Restrict period on Haswell

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <875xsl5pwv.ffs@tglx>
Date: Wed, 31 Jul 2024 21:20:48 +0200
From: Thomas Gleixner <tglx@...utronix.de>
To: Li Huafei <lihuafei1@...wei.com>, peterz@...radead.org, mingo@...hat.com
Cc: acme@...nel.org, namhyung@...nel.org, mark.rutland@....com,
 alexander.shishkin@...ux.intel.com, jolsa@...nel.org, irogers@...gle.com,
 adrian.hunter@...el.com, kan.liang@...ux.intel.com, bp@...en8.de,
 dave.hansen@...ux.intel.com, x86@...nel.org, hpa@...or.com,
 linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
 lihuafei1@...wei.com
Subject: Re: [PATCH] perf/x86/intel: Restrict period on Haswell

On Tue, Jul 30 2024 at 06:33, Li Huafei wrote:
> On my Haswell machine, running the ltp test cve-2015-3290 concurrently
> reports the following warnings:
>
>   perfevents: irq loop stuck!
>   WARNING: CPU: 31 PID: 32438 at arch/x86/events/intel/core.c:3174 intel_pmu_handle_irq+0x285/0x370
>   CPU: 31 UID: 0 PID: 32438 Comm: cve-2015-3290 Kdump: loaded Tainted: G S      W          6.11.0-rc1+ #3
>   ...
>   Call Trace:
>    <NMI>
>    ? __warn+0xa4/0x220
>    ? intel_pmu_handle_irq+0x285/0x370
>    ? __report_bug+0x123/0x130
>    ? intel_pmu_handle_irq+0x285/0x370
>    ? __report_bug+0x123/0x130
>    ? intel_pmu_handle_irq+0x285/0x370
>    ? report_bug+0x3e/0xa0
>    ? handle_bug+0x3c/0x70
>    ? exc_invalid_op+0x18/0x50
>    ? asm_exc_invalid_op+0x1a/0x20
>    ? irq_work_claim+0x1e/0x40
>    ? intel_pmu_handle_irq+0x285/0x370
>    perf_event_nmi_handler+0x3d/0x60
>    nmi_handle+0x104/0x330
>    ? ___ratelimit+0xe4/0x1b0
>    default_do_nmi+0x40/0x100
>    exc_nmi+0x104/0x180
>    end_repeat_nmi+0xf/0x53
>    ...
>    ? intel_pmu_lbr_enable_all+0x2a/0x90
>    ? __intel_pmu_enable_all.constprop.0+0x16d/0x1b0
>    ? __intel_pmu_enable_all.constprop.0+0x16d/0x1b0
>    perf_ctx_enable+0x8e/0xc0
>    __perf_install_in_context+0x146/0x3e0
>    ? __pfx___perf_install_in_context+0x10/0x10
>    remote_function+0x7c/0xa0
>    ? __pfx_remote_function+0x10/0x10
>    generic_exec_single+0xf8/0x150
>    smp_call_function_single+0x1dc/0x230
>    ? __pfx_remote_function+0x10/0x10
>    ? __pfx_smp_call_function_single+0x10/0x10
>    ? __pfx_remote_function+0x10/0x10
>    ? lock_is_held_type+0x9e/0x120
>    ? exclusive_event_installable+0x4f/0x140
>    perf_install_in_context+0x197/0x330
>    ? __pfx_perf_install_in_context+0x10/0x10
>    ? __pfx___perf_install_in_context+0x10/0x10
>    __do_sys_perf_event_open+0xb80/0x1100
>    ? __pfx___do_sys_perf_event_open+0x10/0x10
>    ? __pfx___lock_release+0x10/0x10
>    ? lockdep_hardirqs_on_prepare+0x135/0x200
>    ? ktime_get_coarse_real_ts64+0xee/0x100
>    ? ktime_get_coarse_real_ts64+0x92/0x100
>    do_syscall_64+0x70/0x180
>    entry_SYSCALL_64_after_hwframe+0x76/0x7e
>    ...

Please trim the backtrace to something useful:

https://www.kernel.org/doc/html/latest/process/submitting-patches.html#backtraces

> My machine has 32 physical cores, each with two logical cores. During
> testing, it executes the CVE-2015-3290 test case 100 times concurrently.
>
> This warning was already present in [1] and a patch was given there to
> limit period to 128 on Haswell, but that patch was not merged into the
> mainline.  In [2] the period on Nehalem was limited to 32. I tested 16
> and 32 period on my machine and found that the problem could be
> reproduced with a limit of 16, but the problem did not reproduce when
> set to 32. It looks like we can limit the cycles to 32 on Haswell as
> well.

It looks like? Either it works or not.

>  
> +static void hsw_limit_period(struct perf_event *event, s64 *left)
> +{
> +	*left = max(*left, 32LL);
> +}

And why do we need a copy of nhm_limit_period() ?

Thanks,

        tglx