linux-kernel - Re: [PATCH] perf/core: fix the bug in the event multiplexing

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <ZNNa0abhS53cMNcK@FVFF77S0Q05N>
Date:   Wed, 9 Aug 2023 10:22:25 +0100
From:   Mark Rutland <mark.rutland@....com>
To:     Oliver Upton <oliver.upton@...ux.dev>
Cc:     Huang Shijie <shijie@...amperecomputing.com>, maz@...nel.org,
        james.morse@....com, suzuki.poulose@....com, yuzenghui@...wei.com,
        catalin.marinas@....com, will@...nel.org, pbonzini@...hat.com,
        peterz@...radead.org, ingo@...hat.com, acme@...nel.org,
        alexander.shishkin@...ux.intel.com, jolsa@...nel.org,
        namhyung@...nel.org, irogers@...gle.com,
        linux-arm-kernel@...ts.infradead.org, kvmarm@...ts.linux.dev,
        linux-kernel@...r.kernel.org, kvm@...r.kernel.org,
        linux-perf-users@...r.kernel.org, patches@...erecomputing.com,
        zwang@...erecomputing.com
Subject: Re: [PATCH] perf/core: fix the bug in the event multiplexing

On Wed, Aug 09, 2023 at 08:25:07AM +0000, Oliver Upton wrote:
> Hi Huang,
> 
> On Wed, Aug 09, 2023 at 09:39:53AM +0800, Huang Shijie wrote:
> > 2.) Root cause.
> > 	There is only 7 counters in my arm64 platform:
> > 	  (one cycle counter) + (6 normal counters)
> > 
> > 	In 1.3 above, we will use 10 event counters.
> > 	Since we only have 7 counters, the perf core will trigger
> >        	event multiplexing in hrtimer:
> > 	     merge_sched_in() -->perf_mux_hrtimer_restart() -->
> > 	     perf_rotate_context().
> > 
> >        In the perf_rotate_context(), it does not restore some PMU registers
> >        as context_switch() does.  In context_switch():
> >              kvm_sched_in()  --> kvm_vcpu_pmu_restore_guest()
> >              kvm_sched_out() --> kvm_vcpu_pmu_restore_host()
> > 
> >        So we got wrong result.
> 
> This is a rather vague description of the problem. AFAICT, the
> issue here is on VHE systems we wind up getting the EL0 count
> enable/disable bits backwards when entering the guest, which is
> corroborated by the data you have below.

Yep; IIUC the issue here is that when we take an IRQ from a guest and reprogram
the PMU in the IRQ handler, the IRQ handler will program the PMU with
appropriate host/guest/user/etc filters for a *host* context, and then we'll
return back into the guest without reconfigurign the event filtering for a
*guest* context.

That can happen for perf_rotate_context(), or when we install an event into a
running context, as that'll happen via an IPI.

> > +void arch_perf_rotate_pmu_set(void)
> > +{
> > +	if (is_guest())
> > +		kvm_vcpu_pmu_restore_guest(NULL);
> > +	else
> > +		kvm_vcpu_pmu_restore_host(NULL);
> > +}
> > +
> 
> This sort of hook is rather nasty, and I'd strongly prefer a solution
> that's confined to KVM. I don't think the !is_guest() branch is
> necessary at all. Regardless of how the pmu context is changed, we need
> to go through vcpu_put() before getting back out to userspace.
> 
> We can check for a running vCPU (ick) from kvm_set_pmu_events() and either
> do the EL0 bit flip there or make a request on the vCPU to call
> kvm_vcpu_pmu_restore_guest() immediately before reentering the guest.
> I'm slightly leaning towards the latter, unless anyone has a better idea
> here.

The latter sounds reasonable to me.

I suspect we need to take special care here to make sure we leave *all* events
in a good state when re-entering the guest or if we get to kvm_sched_out()
after *removing* an event via an IPI -- it'd be easy to mess either case up and
leave some events in a bad state.

Thanks,
Mark.