linux-kernel - Re: [PATCH kvm/queue v2 2/3] perf: x86/core: Add interface to query perfmon_event

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CABOYuvbPL0DeEgV4gsC+v786xfBAo3T6+7XQr7cVVzbaoFoEAg@mail.gmail.com>
Date:   Wed, 9 Feb 2022 11:24:11 -0800
From:   David Dunn <daviddunn@...gle.com>
To:     Dave Hansen <dave.hansen@...el.com>
Cc:     Jim Mattson <jmattson@...gle.com>,
        Peter Zijlstra <peterz@...radead.org>,
        Like Xu <like.xu.linux@...il.com>,
        Paolo Bonzini <pbonzini@...hat.com>,
        Sean Christopherson <seanjc@...gle.com>,
        Vitaly Kuznetsov <vkuznets@...hat.com>,
        Wanpeng Li <wanpengli@...cent.com>,
        Joerg Roedel <joro@...tes.org>, kvm@...r.kernel.org,
        linux-kernel@...r.kernel.org, Like Xu <likexu@...cent.com>,
        Stephane Eranian <eranian@...gle.com>
Subject: Re: [PATCH kvm/queue v2 2/3] perf: x86/core: Add interface to query
 perfmon_event_map[] directly

Dave,

In my opinion, the right policy depends on what the host owner and
guest owner are trying to achieve.

If the PMU is being used to locate places where performance could be
improved in the system, there are two sub scenarios:
   - The host and guest are owned by same entity that is optimizing
overall system.  In this case, the guest doesn't need PMU access and
better information is provided by profiling the entire system from the
host.
   - The host and guest are owned by different entities.  In this
case, profiling from the host can identify perf issues in the guest.
But what action can be taken?  The host entity must communicate issues
back to the guest owner through some sort of out-of-band information
channel.  On the other hand, preempting the host PMU to give the guest
a fully functional PMU serves this use case well.

TDX and SGX (outside of debug mode) strongly assume different
entities.  And Intel is doing this to reduce insight of the host into
guest operations.  So in my opinion, preemption makes sense.

There are also scenarios where the host owner is trying to identify
systemwide impacts of guest actions.  For example, detecting memory
bandwidth consumption or split locks.  In this case, host control
without preemption is necessary.

To address these various scenarios, it seems like the host needs to be
able to have policy control on whether it is willing to have the PMU
preempted by the guest.

But I don't see what scenario is well served by the current situation
in KVM.  Currently the guest will either be told it has no PMU (which
is fine) or that it has full control of a PMU.  If the guest is told
it has full control of the PMU, it actually doesn't.  But instead of
losing counters on well defined events (from the guest perspective),
they simply stop counting depending on what the host is doing with the
PMU.

On the other hand, if we flip it around the semantics are more clear.
A guest will be told it has no PMU (which is fine) or that it has full
control of the PMU.  If the guest is told that it has full control of
the PMU, it does.  And the host (which is the thing that granted the
full PMU to the guest) knows that events inside the guest are not
being measured.  This results in all entities seeing something that
can be reasoned about from their perspective.

Thanks,

Dave Dunn

On Wed, Feb 9, 2022 at 10:57 AM Dave Hansen <dave.hansen@...el.com> wrote:

> > I was referring to gaps in the collection of data that the host perf
> > subsystem doesn't know about if ATTRIBUTES.PERFMON is set for a TDX
> > guest. This can potentially be a problem if someone is trying to
> > measure events per unit of time.
>
> Ahh, that makes sense.
>
> Does SGX cause problem for these people?  It can create some of the same
> collection gaps:
>
>         performance monitoring activities are suppressed when entering
>         an opt-out (of performance monitoring) enclave.