linux-kernel - Re: [RFC] perf/x86: Fix a warning on x86_pmu

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20201123142321.GP3021@hirez.programming.kicks-ass.net>
Date:   Mon, 23 Nov 2020 15:23:21 +0100
From:   Peter Zijlstra <peterz@...radead.org>
To:     Namhyung Kim <namhyung@...nel.org>
Cc:     Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
        Thomas Gleixner <tglx@...utronix.de>,
        "H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
        LKML <linux-kernel@...r.kernel.org>,
        Stephane Eranian <eranian@...gle.com>,
        Kan Liang <kan.liang@...ux.intel.com>,
        John Sperbeck <jsperbeck@...gle.com>,
        "Lendacky, Thomas" <Thomas.Lendacky@....com>
Subject: Re: [RFC] perf/x86: Fix a warning on x86_pmu_stop()

On Sat, Nov 21, 2020 at 11:50:11AM +0900, Namhyung Kim wrote:
> When large PEBS is enabled, the below warning is triggered:
> 
>   [6070379.453697] WARNING: CPU: 23 PID: 42379 at arch/x86/events/core.c:1466 x86_pmu_stop+0x95/0xa0
>   ...
>   [6070379.453831] Call Trace:
>   [6070379.453840]  x86_pmu_del+0x50/0x150
>   [6070379.453845]  event_sched_out.isra.0+0x95/0x200
>   [6070379.453848]  group_sched_out.part.0+0x53/0xd0
>   [6070379.453851]  __perf_event_disable+0xee/0x1e0
>   [6070379.453854]  event_function+0x89/0xd0
>   [6070379.453859]  remote_function+0x3e/0x50
>   [6070379.453866]  generic_exec_single+0x91/0xd0
>   [6070379.453870]  smp_call_function_single+0xd1/0x110
>   [6070379.453874]  event_function_call+0x11c/0x130
>   [6070379.453877]  ? task_ctx_sched_out+0x20/0x20
>   [6070379.453880]  ? perf_mux_hrtimer_handler+0x370/0x370
>   [6070379.453882]  ? event_function_call+0x130/0x130
>   [6070379.453886]  perf_event_for_each_child+0x34/0x80
>   [6070379.453889]  ? event_function_call+0x130/0x130
>   [6070379.453891]  _perf_ioctl+0x24b/0x6a0
>   [6070379.453898]  ? sched_setaffinity+0x1ad/0x2a0
>   [6070379.453904]  ? _cond_resched+0x15/0x30
>   [6070379.453906]  perf_ioctl+0x3d/0x60
>   [6070379.453912]  ksys_ioctl+0x87/0xc0
>   [6070379.453917]  __x64_sys_ioctl+0x16/0x20
>   [6070379.453923]  do_syscall_64+0x52/0x180
>   [6070379.453928]  entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> The commit 3966c3feca3f ("x86/perf/amd: Remove need to check "running"
> bit in NMI handler") introduced this.  It seems x86_pmu_stop can be
> called recursively (like when it losts some samples) like below:
> 
>   x86_pmu_stop
>     intel_pmu_disable_event  (x86_pmu_disable)
>       intel_pmu_pebs_disable
>         intel_pmu_drain_pebs_buffer
>           x86_pmu_stop
> 

This shouldn't be possible; intel_pmu_drain_pebs_buffer() calls
drain_pebs(.iregs=NULL), which means that __intel_pmu_pebs_event()
should not end up x86_pmu_stop().

Are you running some old kernel?