[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20251210101147.139674-1-sieberf@amazon.com>
Date: Wed, 10 Dec 2025 12:11:47 +0200
From: Fernand Sieber <sieberf@...zon.com>
To: <peterz@...radead.org>
CC: <abusse@...zon.de>, <bp@...en8.de>, <dave.hansen@...ux.intel.com>,
<dwmw@...zon.co.uk>, <hborghor@...zon.de>, <hpa@...or.com>,
<jschoenh@...zon.de>, <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
<mingo@...hat.com>, <nh-open-source@...zon.com>, <nsaenz@...zon.com>,
<pbonzini@...hat.com>, <seanjc@...gle.com>, <sieberf@...zon.com>,
<stable@...r.kernel.org>, <tglx@...utronix.de>, <x86@...nel.org>
Subject: Re: [PATCH] KVM: x86/pmu: Do not accidentally create BTS events
On Tue, Dec 02, 2025 at 01:44:23PM +0100, Peter Zijlstra wrote:
> On Tue, Dec 02, 2025 at 11:03:11AM +0100, Peter Zijlstra wrote:
> > On Mon, Dec 01, 2025 at 04:23:57PM +0200, Fernand Sieber wrote:
> > > arch/x86/kvm/pmu.c | 13 +++++++++++++
> > > 1 file changed, 13 insertions(+)
> > >
> > > diff --git a/arch/x86/kvm/pmu.c b/arch/x86/kvm/pmu.c
> > > index 487ad19a236e..547512028e24 100644
> > > --- a/arch/x86/kvm/pmu.c
> > > +++ b/arch/x86/kvm/pmu.c
> > > @@ -225,6 +225,19 @@ static u64 get_sample_period(struct kvm_pmc *pmc, u64 counter_value)
> > > {
> > > u64 sample_period = (-counter_value) & pmc_bitmask(pmc);
> > >
> > > + /*
> > > + * A sample_period of 1 might get mistaken by perf for a BTS event, see
> > > + * intel_pmu_has_bts_period(). This would prevent re-arming the counter
> > > + * via pmc_resume_counter(), followed by the accidental creation of an
> > > + * actual BTS event, which we do not want.
> > > + *
> > > + * Avoid this by bumping the sampling period. Note, that we do not lose
> > > + * any precision, because the same quirk happens later anyway (for
> > > + * different reasons) in x86_perf_event_set_period().
> > > + */
> > > + if (sample_period == 1)
> > > + sample_period = 2;
> > > +
> > > if (!sample_period)
> > > sample_period = pmc_bitmask(pmc) + 1;
> > > return sample_period;
> >
> > Oh gawd, I so hate this kvm code. It is so ludicrously bad. The way it
> > keeps recreating counters is just stupid. And then they complain it
> > sucks, it does :-(
> >
> > Anyway, yes this is terrible. Let me try and untangle all this, see if
> > there's a saner solution.
>
> Does something like so work? It is still terrible, but perhaps slightly
> less so.
>
> diff --git a/arch/x86/events/perf_event.h b/arch/x86/events/perf_event.h
> index 2b969386dcdd..493e6ba51e06 100644
> --- a/arch/x86/events/perf_event.h
> +++ b/arch/x86/events/perf_event.h
> @@ -1558,13 +1558,22 @@ static inline bool intel_pmu_has_bts_period(struct perf_event *event, u64 period
> struct hw_perf_event *hwc = &event->hw;
> unsigned int hw_event, bts_event;
>
> - if (event->attr.freq)
> + /*
> + * Only use BTS for fixed rate period==1 events.
> + */
> + if (event->attr.freq || period != 1)
> + return false;
> +
> + /*
> + * BTS doesn't virtualize.
> + */
> + if (event->attr.exclude_host)
> return false;
>
> hw_event = hwc->config & INTEL_ARCH_EVENT_MASK;
> bts_event = x86_pmu.event_map(PERF_COUNT_HW_BRANCH_INSTRUCTIONS);
>
> - return hw_event == bts_event && period == 1;
> + return hw_event == bts_event;
> }
>
> static inline bool intel_pmu_has_bts(struct perf_event *event)
Hi Peter,
I've pulled your changes and confirmed that they address the original
bug report.
The repro I use is running on host, with a guest running:
`perf record -e branches:u -c 2 -a &`
`perf record -e branches:u -c 2 -a &`
Then I monitor the enablement of BTS on the host and verify that without
the change BTS is enabled, and with the change it's not.
This looks good to me, should we go ahead with your changes then?
--Fernand
Amazon Development Centre (South Africa) (Proprietary) Limited
29 Gogosoa Street, Observatory, Cape Town, Western Cape, 7925, South Africa
Registration Number: 2004 / 034463 / 07
Powered by blists - more mailing lists