lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Wed, 3 Mar 2010 22:50:21 +0100
From:	Stephane Eranian <eranian@...gle.com>
To:	Masami Hiramatsu <mhiramat@...hat.com>
Cc:	Peter Zijlstra <peterz@...radead.org>, mingo@...e.hu,
	linux-kernel@...r.kernel.org, paulus@...ba.org,
	robert.richter@....com, fweisbec@...il.com
Subject: Re: [RFC][PATCH 10/11] perf, x86: use LBR for PEBS IP+1 fixup

I think systematically and transparently using LBR to correct PEBS off-by-one
problem is not such a good idea. You've basically highjacked LBR and user
cannot use it in a different way.

There are PEBS+LBR measurements where you care about extracting the LBR data.
There are PEBS measurements where you don't care about getting the correct IP.
I don't necessarily want to pay the price, especially when this could
be done offline
in the tool.


On Wed, Mar 3, 2010 at 10:11 PM, Masami Hiramatsu <mhiramat@...hat.com> wrote:
> Peter Zijlstra wrote:
>> On Wed, 2010-03-03 at 13:05 -0500, Masami Hiramatsu wrote:
>>> Peter Zijlstra wrote:
>>>> PEBS always reports the IP+1, that is the instruction after the one
>>>> that got sampled, cure this by using the LBR to reliably rewind the
>>>> instruction stream.
>>>
>>> Hmm, does PEBS always report one byte after the end address of the
>>> sampled instruction? Or the instruction which will be executed next
>>> step?
>>
>> The next instruction, its trap like.
>>
>>> [...]
>>>> +#include <asm/insn.h>
>>>> +
>>>> +#define MAX_INSN_SIZE      16
>>>
>>> Hmm, we'd better integrate these kinds of definitions into
>>> asm/insn.h... (several features define it)
>>
>> Agreed, I'll look at doing a patch to collect them all into asm/insn.h
>> if nobody beats me to it :-)
>
> At least kprobes doesn't :)
>
>>>> +
>>>> +static void intel_pmu_pebs_fixup_ip(struct pt_regs *regs)
>>>> +{
>>>> +#if 0
>>>> +   /*
>>>> +    * Borken, makes the machine expode at times trying to
>>>> +    * derefence funny userspace addresses.
>>>> +    *
>>>> +    * Should we always fwd decode from @to, instead of trying
>>>> +    * to rewind as implemented?
>>>> +    */
>>>> +
>>>> +   struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>>>> +   unsigned long from = cpuc->lbr_entries[0].from;
>>>> +   unsigned long to = cpuc->lbr_entries[0].to;
>>>
>>> Ah, I see. For branch instruction case, we can use LBR to
>>> find previous IP...
>>
>> Right, we use the LBR to find the basic block.
>
> Hm, that's a good idea :)
>
>>>> +   unsigned long ip = regs->ip;
>>>> +   u8 buf[2*MAX_INSN_SIZE];
>>>> +   u8 *kaddr;
>>>> +   int i;
>>>> +
>>>> +   if (from && to) {
>>>> +           /*
>>>> +            * We sampled a branch insn, rewind using the LBR stack
>>>> +            */
>>>> +           if (ip == to) {
>>>> +                   regs->ip = from;
>>>> +                   return;
>>>> +           }
>>>> +   }
>>>> +
>>>> +   if (user_mode(regs)) {
>>>> +           int bytes = copy_from_user_nmi(buf,
>>>> +                           (void __user *)(ip - MAX_INSN_SIZE),
>>>> +                           2*MAX_INSN_SIZE);
>>>> +
>>>
>>> maybe, you'd better check the source address range is within
>>> the user address range. e.g. ip < MAX_INSN_SIZE.
>>
>> Not only that, I realized user_mode() checks regs->cs, which is not set
>> by the PEBS code, so I added some helpers.
>>
>>>> +
>>>> +   /*
>>>> +    * Try to find the longest insn ending up at the given IP
>>>> +    */
>>>> +   for (i = MAX_INSN_SIZE; i > 0; i--) {
>>>> +           struct insn insn;
>>>> +
>>>> +           kernel_insn_init(&insn, kaddr + MAX_INSN_SIZE - i);
>>>> +           insn_get_length(&insn);
>>>> +           if (insn.length == i) {
>>>> +                   regs->ip -= i;
>>>> +                   return;
>>>> +           }
>>>> +   }
>>>
>>> Hmm, this will not work correctly on x86, since the decoder can
>>> miss-decode the tail bytes of previous instruction as prefix bytes. :(
>>>
>>> Thus, if you want to rewind instruction stream, you need to decode
>>> a function (or basic block) entirely.
>>
>> Something like the below?
>
> Great! it looks good to me.
> Yeah, LBR.to may always smaller than current ip (if no one disabled LBR).
>
> Thank you,
>
>>
>> #ifdef CONFIG_X86_32
>> static bool kernel_ip(unsigned long ip)
>> {
>>         return ip > TASK_SIZE;
>> }
>> #else
>> static bool kernel_ip(unsigned long ip)
>> {
>>         return (long)ip < 0;
>> }
>> #endif
>>
>> static int intel_pmu_pebs_fixup_ip(unsigned long *ipp)
>> {
>>         struct cpu_hw_events *cpuc = &__get_cpu_var(cpu_hw_events);
>>         unsigned long from = cpuc->lbr_entries[0].from;
>>         unsigned long old_to, to = cpuc->lbr_entries[0].to;
>>         unsigned long ip = *ipp;
>>         int i;
>>
>>         /*
>>          * We don't need to fixup if the PEBS assist is fault like
>>          */
>>         if (!x86_pmu.intel_perf_capabilities.pebs_trap)
>>                 return 0;
>>
>>         if (!cpuc->lbr_stack.nr || !from || !to)
>>                 return 0;
>>
>>         if (ip < to)
>>                 return 0;
>>
>>         /*
>>          * We sampled a branch insn, rewind using the LBR stack
>>          */
>>         if (ip == to) {
>>                 *ipp = from;
>>                 return 1;
>>         }
>>
>>         do {
>>                 struct insn insn;
>>                 u8 buf[MAX_INSN_SIZE];
>>                 void *kaddr;
>>
>>                 old_to = to;
>>                 if (!kernel_ip(ip)) {
>>                         int bytes = copy_from_user_nmi(buf, (void __user *)to,
>>                                         MAX_INSN_SIZE);
>>
>>                         if (bytes != MAX_INSN_SIZE)
>>                                 return 0;
>>
>>                         kaddr = buf;
>>                 } else kaddr = (void *)to;
>>
>>                 kernel_insn_init(&insn, kaddr);
>>                 insn_get_length(&insn);
>>                 to += insn.length;
>>         } while (to < ip);
>>
>>         if (to == ip) {
>>                 *ipp = old_to;
>>                 return 1;
>>         }
>>
>>         return 0;
>> }
>>
>> I thought about exposing the success of this fixup as a PERF_RECORD_MISC
>> bit.
>>
>
> --
> Masami Hiramatsu
> e-mail: mhiramat@...hat.com
>



-- 
Stephane Eranian  | EMEA Software Engineering
Google France | 38 avenue de l'Opéra | 75002 Paris
Tel : +33 (0) 1 42 68 53 00
This email may be confidential or privileged. If you received this
communication by mistake, please
don't forward it to anyone else, please erase all copies and
attachments, and please let me know that
it went to the wrong person. Thanks
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ