[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CAM9d7cji+M+qVm4g48Jcgnfjm-=3HVVtv49ntDpksQx8aBdSyQ@mail.gmail.com>
Date: Tue, 24 Nov 2020 14:01:39 +0900
From: Namhyung Kim <namhyung@...nel.org>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...nel.org>, Borislav Petkov <bp@...en8.de>,
Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>, x86@...nel.org,
LKML <linux-kernel@...r.kernel.org>,
Stephane Eranian <eranian@...gle.com>,
Kan Liang <kan.liang@...ux.intel.com>,
John Sperbeck <jsperbeck@...gle.com>,
"Lendacky, Thomas" <Thomas.Lendacky@....com>
Subject: Re: [RFC] perf/x86: Fix a warning on x86_pmu_stop()
Hi Peter,
On Mon, Nov 23, 2020 at 11:23 PM Peter Zijlstra <peterz@...radead.org> wrote:
>
> On Sat, Nov 21, 2020 at 11:50:11AM +0900, Namhyung Kim wrote:
> > When large PEBS is enabled, the below warning is triggered:
> >
> > [6070379.453697] WARNING: CPU: 23 PID: 42379 at arch/x86/events/core.c:1466 x86_pmu_stop+0x95/0xa0
> > ...
> > [6070379.453831] Call Trace:
> > [6070379.453840] x86_pmu_del+0x50/0x150
> > [6070379.453845] event_sched_out.isra.0+0x95/0x200
> > [6070379.453848] group_sched_out.part.0+0x53/0xd0
> > [6070379.453851] __perf_event_disable+0xee/0x1e0
> > [6070379.453854] event_function+0x89/0xd0
> > [6070379.453859] remote_function+0x3e/0x50
> > [6070379.453866] generic_exec_single+0x91/0xd0
> > [6070379.453870] smp_call_function_single+0xd1/0x110
> > [6070379.453874] event_function_call+0x11c/0x130
> > [6070379.453877] ? task_ctx_sched_out+0x20/0x20
> > [6070379.453880] ? perf_mux_hrtimer_handler+0x370/0x370
> > [6070379.453882] ? event_function_call+0x130/0x130
> > [6070379.453886] perf_event_for_each_child+0x34/0x80
> > [6070379.453889] ? event_function_call+0x130/0x130
> > [6070379.453891] _perf_ioctl+0x24b/0x6a0
> > [6070379.453898] ? sched_setaffinity+0x1ad/0x2a0
> > [6070379.453904] ? _cond_resched+0x15/0x30
> > [6070379.453906] perf_ioctl+0x3d/0x60
> > [6070379.453912] ksys_ioctl+0x87/0xc0
> > [6070379.453917] __x64_sys_ioctl+0x16/0x20
> > [6070379.453923] do_syscall_64+0x52/0x180
> > [6070379.453928] entry_SYSCALL_64_after_hwframe+0x44/0xa9
> >
> > The commit 3966c3feca3f ("x86/perf/amd: Remove need to check "running"
> > bit in NMI handler") introduced this. It seems x86_pmu_stop can be
> > called recursively (like when it losts some samples) like below:
> >
> > x86_pmu_stop
> > intel_pmu_disable_event (x86_pmu_disable)
> > intel_pmu_pebs_disable
> > intel_pmu_drain_pebs_buffer
> > x86_pmu_stop
> >
>
> This shouldn't be possible; intel_pmu_drain_pebs_buffer() calls
> drain_pebs(.iregs=NULL), which means that __intel_pmu_pebs_event()
> should not end up x86_pmu_stop().
>
> Are you running some old kernel?
Well, it's actually 5.7.17 but I think the latest version has the same problem.
Yes, it's not about __intel_pmu_pebs_event(). I'm looking at
intel_pmu_drain_pebs_nhm() specifically. There's code like
/* log dropped samples number */
if (error[bit]) {
perf_log_lost_samples(event, error[bit]);
if (perf_event_account_interrupt(event))
x86_pmu_stop(event, 0);
}
if (counts[bit]) {
__intel_pmu_pebs_event(event, iregs, base,
top, bit, counts[bit],
setup_pebs_fixed_sample_data);
}
There's a path to x86_pmu_stop() when an error bit is on.
Thanks,
Namhyung
Powered by blists - more mailing lists