[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <65CD3FC07F3BF942ABE211646D72D770356EC658@IRSMSX110.ger.corp.intel.com>
Date: Tue, 2 Dec 2014 19:09:40 +0000
From: "Berthier, Emmanuel" <emmanuel.berthier@...el.com>
To: Andy Lutomirski <luto@...capital.net>
CC: Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
"Jarzmik, Robert" <robert.jarzmik@...el.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: RE: [PATCH v2] [LBR] Dump LBRs on Exception
> From: Andy Lutomirski [mailto:luto@...capital.net]
> Sent: Friday, November 28, 2014 4:15 PM
> To: Berthier, Emmanuel
> Cc: Thomas Gleixner; H. Peter Anvin; X86 ML; Jarzmik, Robert; LKML
> Subject: Re: [PATCH v2] [LBR] Dump LBRs on Exception
>
> On Fri, Nov 28, 2014 at 12:44 AM, Berthier, Emmanuel
> <emmanuel.berthier@...el.com> wrote:
> > diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
> > index df088bb..f39cded 100644
> > --- a/arch/x86/kernel/entry_64.S
> > +++ b/arch/x86/kernel/entry_64.S
> > @@ -1035,6 +1035,46 @@ apicinterrupt IRQ_WORK_VECTOR \
> > irq_work_interrupt smp_irq_work_interrupt #endif
> >
> > +.macro STOP_LBR
> > +#ifdef CONFIG_LBR_DUMP_ON_EXCEPTION
> > + testl $3,CS+8(%rsp) /* Kernel Space? */
> > + jz 1f
> > + testl $1, lbr_dump_on_exception
>
> Is there a guarantee that, if lbr_dump_on_exception is true, then LBR is on?
> What happens if you schedule between stopping and resuming LBR?
Good point. The current assumption is to rely on the numerous exceptions to "re-arm" the LBR recording.
Even if we bypass UserSpace page faults, we can keep rely on kernel VMalloc page faults to re-arm the recording.
> > + jz 1f
> > + push %rax
> > + push %rcx
> > + push %rdx
> > + movl $MSR_IA32_DEBUGCTLMSR, %ecx
> > + rdmsr
> > + and $~1, %eax /* Disable LBR recording */
> > + wrmsr
>
> wrmsr is rather slow. Have you checked whether this is faster than just
> saving the LBR trace on exception entry?
The figures I have show that for common MSR, rdmsr and wrmsr have quite the same impact, around 100 cycles (greatly depends on the arch).
The cost of stop/start is: 2 rdmsr + 2 wrmsr = 4 msr
The cost of reading LBR is: 1 rdmsr for TOS + 2 rdmsr per record, and there are from 8 to 32 Records (Arch specific) = between 17 to 65 msr.
I've measured on Atom arch (8 records): LBR read versus stop/start: around x3 more time.
As the LBR size is arch dependent, it's not easy to implement the record reading in ASM without any branch, and this would generate maintenance dependency.
I prefer to let perf_event_lbr dealing with all that stuff.
Thx,
Emmanuel.
Powered by blists - more mailing lists