lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <b2775ea5-20c9-4dff-b4b1-bbb212065a22@paulmck-laptop>
Date: Thu, 4 Jan 2024 08:06:54 -0800
From: "Paul E. McKenney" <paulmck@...nel.org>
To: Paolo Bonzini <pbonzini@...hat.com>
Cc: Sean Christopherson <seanjc@...gle.com>,
	Like Xu <like.xu@...ux.intel.com>, Andi Kleen <ak@...ux.intel.com>,
	Kan Liang <kan.liang@...ux.intel.com>,
	Luwei Kang <luwei.kang@...el.com>,
	Peter Zijlstra <peterz@...radead.org>,
	linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
	kvm@...r.kernel.org, Breno Leitao <leitao@...ian.org>,
	Arnaldo Carvalho de Melo <acme@...nel.org>,
	Ingo Molnar <mingo@...hat.com>
Subject: Re: [BUG] Guest OSes die simultaneously (bisected)

On Thu, Jan 04, 2024 at 03:59:52PM +0100, Paolo Bonzini wrote:
> On Thu, Jan 4, 2024 at 3:58 PM Paul E. McKenney <paulmck@...nel.org> wrote:
> >
> > On Wed, Jan 03, 2024 at 05:00:35PM -0800, Paul E. McKenney wrote:
> > > On Wed, Jan 03, 2024 at 04:24:06PM -0800, Sean Christopherson wrote:
> > > > On Wed, Jan 03, 2024, Paul E. McKenney wrote:
> > > > > On Wed, Jan 03, 2024 at 02:22:23PM -0800, Paul E. McKenney wrote:
> > > > > > Hello!
> > > > > >
> > > > > > Since some time between v5.19 and v6.4, long-running rcutorture tests
> > > > > > would (rarely but intolerably often) have all guests on a given host die
> > > > > > simultaneously with something like an instruction fault or a segmentation
> > > > > > violation.
> > > > > >
> > > > > > Each bisection step required 20 hosts running 10 hours each, and
> > > > > > this eventually fingered commit c59a1f106f5c ("KVM: x86/pmu: Add
> > > > > > IA32_PEBS_ENABLE MSR emulation for extended PEBS").  Although this commit
> > > > > > is certainly messing with things that could possibly cause all manner
> > > > > > of mischief, I don't immediately see a smoking gun.  Except that the
> > > > > > commit prior to this one is rock solid.
> > > > > > Just to make things a bit more exciting, bisection in mainline proved
> > > > > > to be problematic due to bugs of various kinds that hid this one.  I was
> > > > > > therefore forced to bisect among the commits backported to the internal
> > > > > > v5.19-based kernel, which fingered the backported version of the patch
> > > > > > called out above.
> > > > >
> > > > > Ah, and so why do I believe that this is a problem in mainline rather
> > > > > than just (say) a backporting mistake?
> > > > >
> > > > > Because this issue was first located in v6.4, which already has this
> > > > > commit included.
> > > > >
> > > > >                                                   Thanx, Paul
> > > > >
> > > > > > Please note that this is not (yet) an emergency.  I will just continue
> > > > > > to run rcutorture on v5.19-based hypervisors in the meantime.
> > > > > >
> > > > > > Any suggestions for debugging or fixing?
> > > >
> > > > This looks suspect:
> > > >
> > > > +       u64 pebs_mask = cpuc->pebs_enabled & x86_pmu.pebs_capable;
> > > > +       int global_ctrl, pebs_enable;
> > > >
> > > > -       arr[0].msr = MSR_CORE_PERF_GLOBAL_CTRL;
> > > > -       arr[0].host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask;
> > > > -       arr[0].guest = intel_ctrl & ~cpuc->intel_ctrl_host_mask;
> > > > -       arr[0].guest &= ~(cpuc->pebs_enabled & x86_pmu.pebs_capable);
> > > > -       *nr = 1;
> > > > +       *nr = 0;
> > > > +       global_ctrl = (*nr)++;
> > > > +       arr[global_ctrl] = (struct perf_guest_switch_msr){
> > > > +               .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> > > > +               .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> > > > +               .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
> > > > +       };
> > > >
> > > >
> > > > IIUC (always a big if with this code), the intent is that the guest's version of
> > > > PERF_GLOBAL_CTRL gets bits that are (a) not exclusive to the host and (b) not
> > > > being used for PEBS.  (b) is necessary because PEBS generates records in memory
> > > > using virtual addresses, i.e. the CPU will write to memory using a virtual address
> > > > that is valid for the host but not the guest.  And so PMU counters that are
> > > > configured to generate PEBS records need to be disabled while running the guest.
> > > >
> > > > Before that commit, the logic was:
> > > >
> > > >   guest[PERF_GLOBAL_CTRL] = ctrl & ~host;
> > > >   guest[PERF_GLOBAL_CTRL] &= ~pebs;
> > > >
> > > > But after, it's now:
> > > >
> > > >   guest[PERF_GLOBAL_CTRL] = ctrl & (~host | ~pebs);
> > > >
> > > > I.e. the kernel is enabled counters in the guest that are not host-only OR not
> > > > PEBS.  E.g. if only counter 0 is in use, it's using PEBS, but it's not exclusive
> > > > to the host, then the new code will yield (truncated to a single byte for sanity)
> > > >
> > > >   1 = 1 & (0xf | 0xe)
> > > >
> > > > and thus keep counter 0 enabled, whereas the old code would yield
> > > >
> > > >   1 = 1 & 0xf
> > > >   0 = 1 & 0xe
> > > >
> > > > A bit of a shot in the dark and completed untested, but I think this is the correct
> > > > fix?
> > >
> > > I am firing off some tests, and either way, thank you very much!!!
> >
> > Woo-hoo!!!  ;-)
> >
> > Tested-by: Paul E. McKenney <paulmck@...nel.org>
> >
> > Will you be sending a proper patch, or would you prefer that I do so?
> > In the latter case, I would need your Signed-off-by.
> 
> I will fast track this one to Linus.

Thank you, Paolo!

One additional request, if I may be so bold...

Although I am happy to have been able to locate the commit (and even
happier that Sean spotted the problem and that you quickly pushed the
fix to mainline!), chasing this consumed a lot of time and systems over
an embarrassingly large number of months.  As in I first spotted this
bug in late July.  Despite a number of increasingly complex attempts,
bisection became feasible only after the buggy commit was backported to
our internal v5.19 code base.  :-(

My (completely random) guess is that there is some rare combination
of events that causes this code to fail.  If so, is it feasible to
construct a test that makes this rare combination of events less rare,
so that similar future bugs are caught more quickly?

And please understand that I am not casting shade on those who wrote,
reviewed, and committed that buggy commit.  As in I freely confess that
I had to stare at Sean's fix for a few minutes before I figured out what
was going on.  Instead, the point I am trying to make is that carefully
constructed tests can serve as tireless and accurate code reviewers.
This won't ever replace actual code review, but my experience indicates
that it will help find more bugs more quickly and more easily.

							Thanx, Paul

> Paolo
> 
> > And again, thank you very much!!!
> >
> >                                                         Thanx, Paul
> >
> > > > diff --git a/arch/x86/events/intel/core.c b/arch/x86/events/intel/core.c
> > > > index a08f794a0e79..92d5a3464cb2 100644
> > > > --- a/arch/x86/events/intel/core.c
> > > > +++ b/arch/x86/events/intel/core.c
> > > > @@ -4056,7 +4056,7 @@ static struct perf_guest_switch_msr *intel_guest_get_msrs(int *nr, void *data)
> > > >         arr[global_ctrl] = (struct perf_guest_switch_msr){
> > > >                 .msr = MSR_CORE_PERF_GLOBAL_CTRL,
> > > >                 .host = intel_ctrl & ~cpuc->intel_ctrl_guest_mask,
> > > > -               .guest = intel_ctrl & (~cpuc->intel_ctrl_host_mask | ~pebs_mask),
> > > > +               .guest = intel_ctrl & ~(cpuc->intel_ctrl_host_mask | pebs_mask),
> > > >         };
> > > >
> > > >         if (!x86_pmu.pebs)
> > > >
> >
> 

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ