lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  PHC 
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 12 Aug 2020 13:32:58 +0200
From:   Paolo Bonzini <>
Cc:     Like Xu <>, Yao <>,
        Sean Christopherson <>,
        Vitaly Kuznetsov <>,
        Wanpeng Li <>,
        Jim Mattson <>,
        Joerg Roedel <>,,,
        Arnaldo Carvalho de Melo <>,
        Mark Rutland <>
Subject: Re: [PATCH] KVM: x86/pmu: Add '.exclude_hv = 1' for guest perf_event

On 12/08/20 13:11, wrote:
>> x86 does not have a hypervisor privilege level, so it never uses
> Arguably it does when Xen, but I don't think we support that, so *phew*.

Yeah, I suppose you could imagine having paravirtualized perf counters
where the Xen privileged domain could ask Xen to run perf counters on

>> exclude_hv; exclude_host already excludes all root mode activity for
>> both ring0 and ring3.
> Right, but we want to tighten the permission checks and not excluding_hv
> is just sloppy.

I would just document that it's ignored as it doesn't make sense.  ARM64
does that too, for new processors where the kernel is not itself split
between supervisor and hypervisor privilege levels.

There are people that are trying to run Linux-based firmware and have
SMM handlers as part of the kernel.  Perhaps they could use exclude_hv
to exclude the SMM handlers from perf (if including them is possible at

> The thing is, we very much do not want to allow unpriv user to be able
> to create: exclude_host=1, exclude_guest=0 counters (they currently
> can).

That would be the case of an unprivileged user that wants to measure
performance of its guests.  It's a scenario that makes a lot of sense,
are you worried about side channels?  Can perf-events on guests leak
more about the host than perf-events on a random userspace program?

> Also, exclude_host is really poorly defined:
>   "Suppose we have nested virt:
> 	  L0-hv
> 	  |
> 	  G0/L1-hv
> 	     |
> 	     G1
>   And we're running in G0, then:
>   - 'exclude_hv' would exclude L0 events
>   - 'exclude_host' would ... exclude L1-hv events?
>   - 'exclude_guest' would ... exclude G1 events?

>From the point of view of G0, L0 *does not exist at all*.  You just
cannot see L0 events if you're running in G0.

exclude_host/exclude_guest are the right definition.

>   Then the next question is, if G0 is a host, does the L1-hv run in
>   G0 userspace or G0 kernel space?

It's mostly kernel, but sometimes you're interested in events from QEMU
or whoever else has opened /dev/kvm.  In that case you care about G0
userspace too.

> The way it is implemented, you basically have to always set
> exclude_host=0, even if there is no virt at all and you want to measure
> your own userspace thing -- which is just weird.

I understand regretting having exclude_guest that way; include_guest
(defaulting to 0!) would have made more sense.  But defaulting to
exclude_host==0 makes sense: if there is no virt at all, memset(0) does
the right thing so it does not seem weird to me.

> I suppose the 'best' option at this point is something like:
> 	/*
> 	 * comment that explains the trainwreck.
> 	 */
> 	if (!exclude_host && !exclude_guest)
> 		exclude_guest = 1;
> 	if ((!exclude_hv || !exclude_guest) && !perf_allow_kernel())
> 		return -EPERM;
> But that takes away the possibility of actually having:
> 'exclude_host=0, exclude_guest=0' to create an event that measures both,
> which also sucks.

In fact both of the above "if"s suck. :(


Powered by blists - more mailing lists