[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <c0f7e5b96829407d839d9e5f3907943c4c0f960f.camel@redhat.com>
Date: Thu, 21 Nov 2024 22:35:47 -0500
From: Maxim Levitsky <mlevitsk@...hat.com>
To: Sean Christopherson <seanjc@...gle.com>
Cc: kvm@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: vmx_pmu_caps_test fails on Skylake based CPUS due to read only
LBRs
On Sun, 2024-11-03 at 18:32 -0500, Maxim Levitsky wrote:
> On Mon, 2024-10-28 at 08:55 -0700, Sean Christopherson wrote:
> > On Fri, Oct 18, 2024, Maxim Levitsky wrote:
> > > Hi,
> > >
> > > Our CI found another issue, this time with vmx_pmu_caps_test.
> > >
> > > On 'Intel(R) Xeon(R) Gold 6328HL CPU' I see that all LBR msrs (from/to and
> > > TOS), are always read only - even when LBR is disabled - once I disable the
> > > feature in DEBUG_CTL, all LBR msrs reset to 0, and you can't change their
> > > value manually. Freeze LBRS on PMI seems not to affect this behavior.
> > >
> > > I don't know if this is how the hardware is supposed to work (Intel's manual
> > > doesn't mention anything about this), or if it is something platform
> > > specific, because this system also was found to have LBRs enabled
> > > (IA32_DEBUGCTL.LBR == 1) after a fresh boot, as if BIOS left them enabled - I
> > > don't have an idea on why.
> > >
> > > The problem is that vmx_pmu_caps_test writes 0 to LBR_TOS via KVM_SET_MSRS,
> > > and KVM actually passes this write to actual hardware msr (this is somewhat
> > > wierd),
> >
> > When the "virtual" LBR event is active in host perf, the LBR MSRs are passed
> > through to the guest, and so KVM needs to propagate the guest values into hardware.
>
> Yes, but usually KVM_SET_MSRS doesn't touch hardware directly, even for registers/msrs
> that are passed through, but rather the relevant values are loaded when the guest vCPU
> is loaded and/or when the guest is entered.
> I don't know the details though.
>
>
> > > and since the MSR is not writable and silently drops writes instead,
> > > once the test tries to read it, it gets some random value instead.
> >
> > This just showed up in our testing too (delayed backport on our end). I haven't
> > (yet) tried debugging our setup, but is there any chance Intel PT is interfering?
> >
> > 33.3.1.2 Model Specific Capability Restrictions
> > Some processor generations impose restrictions that prevent use of
> > LBRs/BTS/BTM/LERs when software has enabled tracing with Intel Processor Trace.
> > On these processors, when TraceEn is set, updates of LBR, BTS, BTM, LERs are
> > suspended but the states of the corresponding IA32_DEBUGCTL control fields
> > remained unchanged as if it were still enabled. When TraceEn is cleared, the
> > LBR array is reset, and LBR/BTS/BTM/LERs updates will resume.
> > Further, reads of these registers will return 0, and writes will be dropped.
> >
> > The list of MSRs whose updates/accesses are restricted follows.
> >
> > • MSR_LASTBRANCH_x_TO_IP, MSR_LASTBRANCH_x_FROM_IP, MSR_LBR_INFO_x, MSR_LASTBRANCH_TOS
> > • MSR_LER_FROM_LIP, MSR_LER_TO_LIP
> > • MSR_LBR_SELECT
> >
> > For processors with CPUID DisplayFamily_DisplayModel signatures of 06_3DH,
> > 06_47H, 06_4EH, 06_4FH, 06_56H, and 06_5EH, the use of Intel PT and LBRs are
> > mutually exclusive.
> >
> > If Intel PT is NOT responsible, i.e. the behavior really is due to DEBUG_CTL.LBR=0,
> > then I don't see how KVM can sanely virtualize LBRs.
> >
>
> Hi!
>
>
> I will check PT influence soon, but to me it looks like the hardware implementation has changed.
> It is just too consistent:
>
> When DEBUG_CTL.LBR=1, the LBRs do work, I see all the registers update, although
> TOS does seem to be stuck at one value, but it does change sometimes, and it's non zero.
>
> The FROM/TO do show healthy amount of updates
>
> Note that I read all msrs using 'rdmsr' userspace tool.
>
> However as soon as I disable DEBUG_CTL.LBR, all these MSRs reset to 0, and can't be changed.
Hi,
I tested this on another skylake based machine (Intel(R) Xeon(R) Silver 4214) and I see the same behavior:
LBR_TOS is readonly:
It's 0 when LBRS disabled in DEBUG_CTL, and running (changes all the time as expected)
when LBRS are enabled in the DEBUG_CTL.
IA32_RTIT_CTL.TraceEn is disabled (msr 0x570 is 0).
Also on this machine BIOS didn't left LBRs running.
I guess we need to at least disable the check in the unit test or at least
speak with someone from Intel to clarify on what is going on.
What do you think?
Best regards,
Maxim Levitsky
>
> I'll check this on another Skylake based machine and see if I see the same thing.
>
> Best regards,
> Maxim Levitsky
>
Powered by blists - more mailing lists