linux-kernel - Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <13b55438-d990-6d6d-f1d3-8e8a18027825@amd.com>
Date:   Tue, 3 Oct 2023 23:01:56 +0530
From:   Manali Shukla <manali.shukla@....com>
To:     Peter Zijlstra <peterz@...radead.org>,
        Sean Christopherson <seanjc@...gle.com>
Cc:     Ingo Molnar <mingo@...nel.org>,
        Dapeng Mi <dapeng1.mi@...ux.intel.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Arnaldo Carvalho de Melo <acme@...nel.org>,
        Kan Liang <kan.liang@...ux.intel.com>,
        Like Xu <likexu@...cent.com>,
        Mark Rutland <mark.rutland@....com>,
        Alexander Shishkin <alexander.shishkin@...ux.intel.com>,
        Jiri Olsa <jolsa@...nel.org>,
        Namhyung Kim <namhyung@...nel.org>,
        Ian Rogers <irogers@...gle.com>,
        Adrian Hunter <adrian.hunter@...el.com>, kvm@...r.kernel.org,
        linux-perf-users@...r.kernel.org, linux-kernel@...r.kernel.org,
        Zhenyu Wang <zhenyuw@...ux.intel.com>,
        Zhang Xiong <xiong.y.zhang@...el.com>,
        Lv Zhiyuan <zhiyuan.lv@...el.com>,
        Yang Weijiang <weijiang.yang@...el.com>,
        Dapeng Mi <dapeng1.mi@...el.com>,
        Jim Mattson <jmattson@...gle.com>,
        David Dunn <daviddunn@...gle.com>,
        Mingwei Zhang <mizhang@...gle.com>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [Patch v4 07/13] perf/x86: Add constraint for guest perf metrics
 event

On 10/3/2023 1:46 PM, Peter Zijlstra wrote:
> On Mon, Oct 02, 2023 at 05:56:28PM -0700, Sean Christopherson wrote:
>> On Mon, Oct 02, 2023, Peter Zijlstra wrote:
> 
>>> I'm not sure what you're suggesting here. It will have to save/restore
>>> all those MSRs anyway. Suppose it switches between vCPUs.
>>
>> The "when" is what's important.   If KVM took a literal interpretation of
>> "exclude guest" for pass-through MSRs, then KVM would context switch all those
>> MSRs twice for every VM-Exit=>VM-Enter roundtrip, even when the VM-Exit isn't a
>> reschedule IRQ to schedule in a different task (or vCPU).  The overhead to save
>> all the host/guest MSRs and load all of the guest/host MSRs *twice* for every
>> VM-Exit would be a non-starter.  E.g. simple VM-Exits are completely handled in
>> <1500 cycles, and "fastpath" exits are something like half that.  Switching all
>> the MSRs is likely 1000+ cycles, if not double that.
> 
> See, you're the virt-nerd and I'm sure you know what you're talking
> about, but I have no clue :-) I didn't know there were different levels
> of vm-exit.
> 
>> FWIW, the primary use case we care about is for slice-of-hardware VMs, where each
>> vCPU is pinned 1:1 with a host pCPU.
> 
> I've been given to understand that vm-exit is a bad word in this
> scenario, any exit is a fail. They get MWAIT and all the other crap and
> more or less pretend to be real hardware.
> 
> So why do you care about those MSRs so much? That should 'never' happen
> in this scenario.
> 
>>>> Or at least, that was my reading of things.  Maybe it was just a
>>>> misunderstanding because we didn't do a good job of defining the behavior.
>>>
>>> This might be the case. I don't particularly care where the guest
>>> boundary lies -- somewhere in the vCPU thread. Once the thread is gone,
>>> PMU is usable again etc..
>>
>> Well drat, that there would have saved a wee bit of frustration.  Better late
>> than never though, that's for sure.
>>
>> Just to double confirm: keeping guest PMU state loaded until the vCPU is scheduled
>> out or KVM exits to userspace, would mean that host perf events won't be active
>> for potentially large swaths of non-KVM code.  Any function calls or event/exception
>> handlers that occur within the context of ioctl(KVM_RUN) would run with host
>> perf events disabled.
> 
> Hurmph, that sounds sub-optimal, earlier you said <1500 cycles, this all
> sounds like a ton more.
> 
> /me frobs around the kvm code some...
> 
> Are we talking about exit_fastpath loop in vcpu_enter_guest() ? That
> seems to run with IRQs disabled, so at most you can trigger a #PF or
> something, which will then trip an exception fixup because you can't run
> #PF with IRQs disabled etc..
> 
> That seems fine. That is, a theoretical kvm_x86_handle_enter_irqoff()
> coupled with the existing kvm_x86_handle_exit_irqoff() seems like
> reasonable solution from where I'm sitting. That also more or less
> matches the FPU state save/restore AFAICT.
> 
> Or are you talking about the whole of vcpu_run() ? That seems like a
> massive amount of code, and doesn't look like anything I'd call a
> fast-path. Also, much of that loop has preemption enabled...
> 
>> Are you ok with that approach?  Assuming we don't completely botch things, the
>> interfaces are sane, we can come up with a clean solution for handling NMIs, etc.
> 
> Since you steal the whole PMU, can't you re-route the PMI to something
> that's virt friendly too?
> 
>>> It also means ::exclude_guest should actually work -- it often does not
>>> today -- the IBS thing for example totally ignores it.
>>
>> Is that already an in-tree, or are you talking about Manali's proposed series to
>> support virtualizing IBS?
> 
> The IBS code as is, it totally ignores ::exclude_guest. Manali was going
> to add some of it. But I'm not at all sure about the state of the other
> PMU drivers we have.
> 
> Just for giggles, P4 has VMX support... /me runs like crazy

I am working on Solution 1.1 from the approach proposed in [*]. 
I will send V2 (for IBS virtualization series) based on it shortly.

* https://lore.kernel.org/all/20230908133114.GK19320@noisy.programming.kicks-ass.net/T/#m7389910e577966c93a0b50fbaf9442be80dc730b

- Manali