linux-kernel - Re: [PATCH v2 00/15] Introduce Architectural LBR for vPMU

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <74421cf1-30e9-e9a3-cf87-d0e7f12917af@intel.com>
Date:   Mon, 30 Jan 2023 21:38:51 +0800
From:   "Yang, Weijiang" <weijiang.yang@...el.com>
To:     Sean Christopherson <seanjc@...gle.com>
CC:     <pbonzini@...hat.com>, <jmattson@...gle.com>,
        <kvm@...r.kernel.org>, <linux-kernel@...r.kernel.org>,
        <like.xu.linux@...il.com>, <kan.liang@...ux.intel.com>,
        <wei.w.wang@...el.com>
Subject: Re: [PATCH v2 00/15] Introduce Architectural LBR for vPMU


On 1/28/2023 6:46 AM, Sean Christopherson wrote:
> On Thu, Nov 24, 2022, Yang Weijiang wrote:
>> Intel CPU model-specific LBR(Legacy LBR) has evolved to Architectural
>> LBR(Arch LBR [0]), it's the replacement of legacy LBR on new platforms.
>> The native support patches were merged into 5.9 kernel tree, and this
>> patch series is to enable Arch LBR in vPMU so that guest can benefit
>> from the feature.
>>
>> The main advantages of Arch LBR are [1]:
>> - Faster context switching due to XSAVES support and faster reset of
>>    LBR MSRs via the new DEPTH MSR
>> - Faster LBR read for a non-PEBS event due to XSAVES support, which
>>    lowers the overhead of the NMI handler.
>> - Linux kernel can support the LBR features without knowing the model
>>    number of the current CPU.
>>
>>  From end user's point of view, the usage of Arch LBR is the same as
>> the Legacy LBR that has been merged in the mainline.
>>
>> Note, in this series, there's one restriction for guest Arch LBR, i.e.,
>> guest can only set its LBR record depth the same as host's. This is due
>> to the special behavior of MSR_ARCH_LBR_DEPTH:
>> 1) On write to the MSR, it'll reset all Arch LBR recording MSRs to 0s.
>> 2) XRSTORS resets all record MSRs to 0s if the saved depth mismatches
>> MSR_ARCH_LBR_DEPTH.
>> Enforcing the restriction keeps KVM Arch LBR vPMU working flow simple
>> and straightforward.
>>
>> Paolo refactored the old series and the resulting patches became the
>> base of this new series, therefore he's the author of some patches.
> To be very blunt, this series is a mess.  I don't want to point fingers as there
> is plenty of blame to go around.  The existing LBR support is a confusing mess,
> vPMU as a whole has been neglected for too long, review feedback has been relatively
> non-existent, and I'm sure some of the mess is due to Paolo trying to hastily fix
> things up back when this was temporarily queued.
>
> However, for arch LBR support to be merged, things need to change.
>
> First and foremost, the existing LBR support needs to be documented.  Someone,
> I don't care who, needs to provide a detailed writeup of the contract between KVM
> and perf.  Specifically, I want to know:
>
>    1. When exactly is perf allowed to take control of LBR MRS.  Task switch?  IRQ?
>       NMI?
>
>    2. What is the expected behavior when perf is using LBRs?  Is the guest supposed
>       to be traced?
>
>    3. Why does KVM snapshot DEBUGCTL with IRQs enabled, but disables IRQs when
>       accessing LBR MSRs?
>
> It doesn't have to be polished, e.g. I'll happily wordsmith things into proper
> documentation, but I want to have a very clear understanding of how LBR support
> is _intended_ to function and how it all _actually_ functions without having to
> make guesses.
>
> And depending on the answers, I want to revisit KVM's LBR implementation before
> tackling arch LBRs.  Letting perf usurp LBRs while KVM has the vCPU loaded is
> frankly ridiculous.  Just have perf set a flag telling KVM that it needs to take
> control of LBRs and have KVM service the flag as a request or something.  Stealing
> the LBRs back in IRQ context adds a stupid amount of complexity without much value,
> e.g. waiting a few branches for KVM to get to a safe place isn't going to meaningfully
> change the traces.  If that can't actually happen, then why on earth does KVM need
> to disable IRQs to read MSRs?
>
> And AFAICT, since KVM unconditionally loads the guest's DEBUGCTL, whether or not
> guest branches show up in the LBRs when the host is tracing is completely up to
> the whims of the guest.  If that's correct, then again, what's the point of the
> dance between KVM and perf?
>
> Beyond the "how does this work" issues, there needs to be tests.  At the absolute
> minimum, there needs to be selftests showing that this stuff actually works, that
> save/restore (migration) works, that the MSRs can/can't be accessed when guest
> CPUID is (in)correctly configured, etc. And I would really, really like to have
> tests that force contention between host and guests, e.g. to make sure that KVM
> isn't leaking host state or outright exploding, but I can understand that those
> types of tests would be very difficult to write.
>
> I've pushed a heavily reworked, but definitely broken, version to
>
>    git@...hub.com:sean-jc/linux.git x86/arch_lbrs
>
> It compiles, but it's otherwise untested and there are known gaps.  E.g. I omitted
> toggling load+clear of ARCH_LBR_CTL because I couldn't figure out the intended
> behavior.

Appreciated for your elaborate review and comments!

I'll check your reworked version and discuss with stakeholders on how to 
move the work forward.