[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20191016155942.GB5866@linux.intel.com>
Date: Wed, 16 Oct 2019 08:59:42 -0700
From: Sean Christopherson <sean.j.christopherson@...el.com>
To: Thomas Gleixner <tglx@...utronix.de>
Cc: Fenghua Yu <fenghua.yu@...el.com>, Ingo Molnar <mingo@...hat.com>,
Borislav Petkov <bp@...en8.de>, H Peter Anvin <hpa@...or.com>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Dave Hansen <dave.hansen@...el.com>,
Paolo Bonzini <pbonzini@...hat.com>,
Radim Krcmar <rkrcmar@...hat.com>,
Ashok Raj <ashok.raj@...el.com>,
Tony Luck <tony.luck@...el.com>,
Dan Williams <dan.j.williams@...el.com>,
Xiaoyao Li <xiaoyao.li@...el.com>,
Sai Praneeth Prakhya <sai.praneeth.prakhya@...el.com>,
Ravi V Shankar <ravi.v.shankar@...el.com>,
linux-kernel <linux-kernel@...r.kernel.org>,
x86 <x86@...nel.org>, kvm@...r.kernel.org
Subject: Re: [PATCH v9 09/17] x86/split_lock: Handle #AC exception for split
lock
On Wed, Oct 16, 2019 at 11:29:00AM +0200, Thomas Gleixner wrote:
> > - Modify the #AC handler to test/set the same atomic variable as the
> > sysfs knob. This is the "disabled by kernel" flow.
>
> That's the #AC in kernel handler, right?
Yes.
> > - Modify the debugfs/sysfs knob to only allow disabling split-lock
> > detection. This is the "disabled globally" path, i.e. sends IPIs to
> > clear MSR_TEST_CTRL.split_lock on all online CPUs.
>
> Why only disable? What's wrong with reenabling it? The shiny new driver you
> are working on is triggering #AC. So in order to test the fix, you need to
> reboot the machine instead of just unloading the module, reenabling #AC and
> then loading the fixed one?
A re-enabling path adds complexity (though not much) and is undesirable
for a production environment as a split-lock issue in the kernel isn't
going to magically disappear. And I thought that disable-only was also
your preferred implementation based on a previous comment[*], but that
comment may have been purely in the scope of userspace applications.
Anyways, my personal preference would be to keep things simple and not
support a re-enabling path. But then again, I do 99.9% of my development
in VMs so my vote probably shouldn't count regarding the module issue.
[*] https://lkml.kernel.org/r/alpine.DEB.2.21.1904180832290.3174@nanos.tec.linutronix.de
> > - Modify the resume/init flow to clear MSR_TEST_CTRL.split_lock if it's
> > been disabled on *any* CPU via #AC or via the knob.
>
> Fine.
>
> > - Remove KVM loading of MSR_TEST_CTRL, i.e. KVM *never* writes the CPU's
> > actual MSR_TEST_CTRL. KVM still emulates MSR_TEST_CTRL so that the
> > guest can do WRMSR and handle its own #AC faults, but KVM doesn't
> > change the value in hardware.
> >
> > * Allowing guest to enable split-lock detection can induce #AC on
> > the host after it has been explicitly turned off, e.g. the sibling
> > hyperthread hits an #AC in the host kernel, or worse, causes a
> > different process in the host to SIGBUS.
> >
> > * Allowing guest to disable split-lock detection opens up the host
> > to DoS attacks.
>
> Wasn't this discussed before and agreed on that if the host has AC enabled
> that the guest should not be able to force disable it? I surely lost track
> of this completely so my memory might trick me.
Yes, I was restating that point, or at least attempting to.
> The real question is what you do when the host has #AC enabled and the
> guest 'disabled' it and triggers #AC. Is that going to be silently ignored
> or is the intention to kill the guest in the same way as we kill userspace?
>
> The latter would be the right thing, but given the fact that the current
> kernels easily trigger #AC today, that would cause a major wreckage in
> hosting scenarios. So I fear we need to bite the bullet and have a knob
> which defaults to 'handle silently' and allows to enable the kill mechanics
> on purpose. 'Handle silently' needs some logging of course, at least a per
> guest counter which can be queried and a tracepoint.
Powered by blists - more mailing lists