lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Wed, 15 Feb 2017 10:11:41 +0200
From:   Alexander Shishkin <alexander.shishkin@...ux.intel.com>
To:     Peter Zijlstra <peterz@...radead.org>
Cc:     Ingo Molnar <mingo@...hat.com>, linux-kernel@...r.kernel.org,
        vince@...ter.net, eranian@...gle.com,
        Arnaldo Carvalho de Melo <acme@...radead.org>,
        Borislav Petkov <bp@...e.de>,
        Thomas Gleixner <tglx@...utronix.de>
Subject: Re: [PATCH 1/2] perf/x86/intel/pt: Fail event scheduling on conflict with VMX

Peter Zijlstra <peterz@...radead.org> writes:

> On Tue, Feb 14, 2017 at 07:38:07PM +0100, Peter Zijlstra wrote:
>
>> Right, so I question the whole 'lets not schedule PT when VMX' premise,
>> it leads to inconsistencies all over. How about we treat it like
>> ->add() succeeded and VMX simply results in no output.
>> 
>> Esp. when you then emit 'fake' data into/from a vmlaunch/vmresume
>> instruction.
>
> That is, what about something like the below? (completely untested for
> obvious raisins).
>
> That should schedule PT like normal, and where VMXON will auto-clear
> TraceEn for us, we make VMXOFF set it again.

This makes sense, but this will only make a difference if we're tracing
(the kernel side of) the process that did VMXON in the first place and
we want to see what happens immediately after VMXOFF.

>
> ---
>  arch/x86/events/intel/pt.c | 38 ++++++++++++++++++++++++++------------
>  arch/x86/events/intel/pt.h |  1 +
>  2 files changed, 27 insertions(+), 12 deletions(-)
>
> diff --git a/arch/x86/events/intel/pt.c b/arch/x86/events/intel/pt.c
> index 5900471ee508..a42fa1bef761 100644
> --- a/arch/x86/events/intel/pt.c
> +++ b/arch/x86/events/intel/pt.c
> @@ -411,6 +411,7 @@ static u64 pt_config_filters(struct perf_event *event)
>  
>  static void pt_config(struct perf_event *event)
>  {
> +	struct pt *pt = this_cpu_ptr(&pt_ctx);
>  	u64 reg;
>  
>  	if (!event->hw.itrace_started) {
> @@ -429,11 +430,14 @@ static void pt_config(struct perf_event *event)
>  	reg |= (event->attr.config & PT_CONFIG_MASK);
>  
>  	event->hw.config = reg;
> -	wrmsrl(MSR_IA32_RTIT_CTL, reg);
> +
> +	if (!pt->vmx_on)
> +		wrmsrl(MSR_IA32_RTIT_CTL, reg);
>  }
>  
>  static void pt_config_stop(struct perf_event *event)
>  {
> +	struct pt *pt = this_cpu_ptr(&pt_ctx);
>  	u64 ctl = READ_ONCE(event->hw.config);
>  
>  	/* may be already stopped by a PMI */
> @@ -441,7 +445,9 @@ static void pt_config_stop(struct perf_event *event)
>  		return;
>  
>  	ctl &= ~RTIT_CTL_TRACEEN;
> -	wrmsrl(MSR_IA32_RTIT_CTL, ctl);
> +
> +	if (!pt->vmx_on)
> +		wrmsrl(MSR_IA32_RTIT_CTL, ctl);
>  
>  	WRITE_ONCE(event->hw.config, ctl);
>  
> @@ -1174,10 +1180,12 @@ void intel_pt_interrupt(void)
>  	/*
>  	 * If VMX is on and PT does not support it, don't touch anything.
>  	 */
> -	if (READ_ONCE(pt->vmx_on))
> +	if (READ_ONCE(pt->vmx_on)) {
> +		WRITE_ONCE(pt->vmx_pmi_pending, 1);
>  		return;
> +	}

This is even simpler: we actually need to carry out the first part of
the interrupt function anyway, which deals with updating buffer pointers
etc, thus "handling" the PMI, but we don't restart the event, which will
be then done by the intel_pt_handle_vmx(0), so we don't need the
pending_pmi thingy.

Now the fake data is worrying me much more. Consider this: we start an
event while ->vmx_on==1, which means that before we write a fake VMCS
packet, we need to write a whole bunch of other packets to establish the
context with synchronization point and kitchen sink (PSB..PSBEND), then
fake a trace start TIP.PGE, then VMCS, then TIP.PGD. Then, we should
remember that we did this once to not do it again on every sched-in.

Another corner case is when there's not enough room in the buffer and we
need to postpone the fake VMCS until there is room again. Let me see if
there's more.

Regards,
--
Alex

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ