lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <ec514ab8-cf18-6fa7-d2b5-5f81e1a70983@oracle.com>
Date:   Mon, 9 Jul 2018 18:42:29 -0400
From:   Boris Ostrovsky <boris.ostrovsky@...cle.com>
To:     Mark Rutland <mark.rutland@....com>,
        Peter Zijlstra <peterz@...radead.org>
Cc:     linux-kernel@...r.kernel.org, Ingo Molnar <mingo@...hat.com>,
        Jin Yao <yao.jin@...ux.intel.com>
Subject: Re: [RFC PATCH] perf/core: don't sample kernel regs upon skid

On 07/02/2018 12:02 PM, Mark Rutland wrote:
> On Mon, Jul 02, 2018 at 05:46:55PM +0200, Peter Zijlstra wrote:
>> On Mon, Jul 02, 2018 at 04:12:50PM +0100, Mark Rutland wrote:
>>> Users can request that general purpose registers, instruction pointer,
>>> etc, are sampled when a perf event counter overflows. To try to avoid
>>> this resulting in kernel state being leaked, unprivileged users are
>>> usually forbidden from opening events which count while the kernel is
>>> running.
>>>
>>> Unfortunately, this is not sufficient to avoid leading kernel state.
>> 'leaking' surely.
>>
>>> For various reasons, there can be a delay between the overflow occurring
>>> and the resulting overflow exception (e.g. an NMI) being taken. During
>>> this window, other instructions may be executed, resulting in skid.
>>>
>>> This skid means that a userspace-only event overflowing may result in an
>>> exception being taken *after* entry to the kernel, allowing kernel
>>> registers to be sampled. Depending on the amount of skid, this may only
>>> leak the PC (breaking KASLR), or it may leak secrets which are currently
>>> live in GPRs.
>>>
>>> Let's avoid this by only sampling from the user registers when an event
>>> is supposed to exclude the kernel, providing the illusion that the
>>> overflow exception is taken from userspace.
>>>
>>> We also have similar cases when sampling a guest, where we get the host
>>> regs in some cases. It's not entirely clear to me how we should handle
>>> these.
>> Would not a similar:
>>
>> 	if ((event->attr.exclude_hv || event->attr.exclude_host) /* WTF both !? */ &&
>> 	    perf_guest_cbs && !perf_guest_cbs->is_in_guest())
>> 		return perf_guest_cbs->guest_pt_regs();
>>
>> work there? 
> Mostly. It's fun if the user also passed exclude_guest -- I have no idea
> what should be sampled there, if anything.
>
> It's easy to get stuck in a rabbit hole looking at this.
>
>> Of course, perf_guest_info_callbacks is currently lacking that
>> guest_pt_regs() thingy..
> Yeah; I started looking at implementing it, but ran away since it wasn't
> clear to me how to build that on most architectures.
>
>>> Signed-off-by: Mark Rutland <mark.rutland@....com>
>>> Cc: Ingo Molnar <mingo@...hat.com>
>>> Cc: Jin Yao <yao.jin@...ux.intel.com>
>>> Cc: Peter Zijlstra <peterz@...radead.org>
>>> ---
>>>  kernel/events/core.c | 28 ++++++++++++++++++++++++++++
>>>  1 file changed, 28 insertions(+)
>>>
>>> diff --git a/kernel/events/core.c b/kernel/events/core.c
>>> index 8f0434a9951a..2ab2548b2e66 100644
>>> --- a/kernel/events/core.c
>>> +++ b/kernel/events/core.c
>>> @@ -6361,6 +6361,32 @@ perf_callchain(struct perf_event *event, struct pt_regs *regs)
>>>  	return callchain ?: &__empty_callchain;
>>>  }
>>>  
>>> +static struct pt_regs *perf_get_sample_regs(struct perf_event *event,
>>> +					    struct pt_regs *regs)
>>> +{
>>> +	/*
>>> +	 * Due to interrupt latency (AKA "skid"), we may enter the kernel
>>> +	 * before taking an overflow, even if the PMU is only counting user
>>> +	 * events.
>>> +	 *
>>> +	 * If we're not counting kernel events, always use the user regs when
>>> +	 * sampling.
>>> +	 *
>>> +	 * TODO: what do we do about sampling a guest's registers? The IP is
>>> +	 * special-cased, but for the rest of the regs they'll get the
>>> +	 * user/kernel regs depending on whether exclude_kernel is set, which
>>> +	 * is nonsensical.
>>> +	 *
>>> +	 * We can't get at the full set of regs in all cases (e.g. Xen's PV PMU
>>> +	 * can't provide the GPRs), so should we just zero the GPRs when in a
>>> +	 * guest? Or skip outputting the regs in perf_output_sample?
>> Seems daft Xen cannot provide registers; why is that? Boris?
> The xen_pmu_regs structure simply doesn't have them, so I assume there's
> no API to get them.
>
> Given we don't currently sample the guest regs, I'd be tempted to just
> zero them for now, or skip the sample at output time (if that doesn't
> break some other case).

(Was out on vacation, couldn't respond earlier)

Yes, PV guests only get a limited set of registers passed to the handler
by the hypervisor. GPRs are not part of this set.

I think we need do

diff --git a/arch/x86/xen/pmu.c b/arch/x86/xen/pmu.c
index 7d00d4a..95997e6 100644
--- a/arch/x86/xen/pmu.c
+++ b/arch/x86/xen/pmu.c
@@ -478,7 +478,7 @@ static void xen_convert_regs(const struct
xen_pmu_regs *xen_regs,
 irqreturn_t xen_pmu_irq_handler(int irq, void *dev_id)
 {
        int err, ret = IRQ_NONE;
-       struct pt_regs regs;
+       struct pt_regs regs = {0};
        const struct xen_pmu_data *xenpmu_data = get_xenpmu_data();
        uint8_t xenpmu_flags = get_xenpmu_flags();


Do you want me to submit a separate patch or can you make this part of
yours?

Thanks.
-boris

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ