[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrWr=6BnQ7r4KYDVJzeNPO-pxk9wRmkkLy+mM1T_85Y=jw@mail.gmail.com>
Date: Thu, 4 Dec 2014 10:09:53 -0800
From: Andy Lutomirski <luto@...capital.net>
To: "Berthier, Emmanuel" <emmanuel.berthier@...el.com>
Cc: Thomas Gleixner <tglx@...utronix.de>,
"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
"Jarzmik, Robert" <robert.jarzmik@...el.com>,
LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] [LBR] Dump LBRs on Exception
On Thu, Dec 4, 2014 at 8:01 AM, Berthier, Emmanuel
<emmanuel.berthier@...el.com> wrote:
>> From: Andy Lutomirski [mailto:luto@...capital.net]
>> Sent: Wednesday, December 3, 2014 8:30 PM
>> To: Berthier, Emmanuel
>> Cc: Thomas Gleixner; H. Peter Anvin; X86 ML; Jarzmik, Robert; LKML
>> Subject: Re: [PATCH v2] [LBR] Dump LBRs on Exception
>> > The final patch will bypass the new code in case of UserSpace page fault, so
>> performance impact will be very low.
>> > LBRs copy takes much more time than LBR stop/start.
>> >
>> > The simple is the better:
>> >
>> > .macro STOP_LBR
>> > #ifdef CONFIG_LBR_DUMP_ON_EXCEPTION
>> > testl $3,CS(%rsp) /* Kernel Space? */
>> > jnz 1f
>> > testl $3, PER_CPU_VAR(lbr_dump_state) /* Disabled? */
>> > jnz 1f
>>
>> But that just wasted two of your LBR slots.
>
> No: false test does not generate Branch record, ex:
>
> Last Branch Records:
> to: [<ffffffff828122a0>] page_fault+0x0/0x90
> from: [<ffffffff823c0e06>] sysrq_handle_crash+0x16/0x20
> to: [<ffffffff823c0df0>] sysrq_handle_crash+0x0/0x20
> from: [<ffffffff823c156c>] __handle_sysrq+0x9c/0x170
> to: [<ffffffff823c1562>] __handle_sysrq+0x92/0x170
>
>> > push %rax
>> > push %rcx
>> > push %rdx
>> > movl $MSR_IA32_DEBUGCTLMSR, %ecx
>> > rdmsr
>> > and $~1, %eax /* Disable LBR recording */
>> > wrmsr
>> > pop %rdx
>> > pop %rcx
>> > pop %rax
>>
>> And the general problem with this approach (even ignoring the performance
>> hit, and kernel faults on user addresses really do happen in real workloads) is
>> that you're not saving and restoring MSR_IA32_DEBUGCTL.
>> It may be that
>> the rest of your patch does whatever magic is needed to make this work, but
>> from just this code it's not at all obvious that this is correct.
>
> The algorithm is quite simple:
> When I enter in Exception handler, I stop LBR recording, and dump its content later if needed.
> When I leave Exception Handler, I restart LBR recording.
> So, after the first exception, LBR in On.
> In case of nested Exceptions and crash, you're right, LBR will probably not be relevant.
> But your proposal does not solve this issue: If we save registers during 1rst exception, and then overwrite them during 2nd level,
> we will lose relevant info if crash is due to the 1rst exception.
>
>> Hence my suggestion for rdmsr -- if you're willing to enable this and take the
>> performance hit, you can simplify it a lot and save some branch slots by
>> unconditionally doing the rdmsrs if you've enabled the LBR tracing IDT entry.
>> The simplification from using rdmsr isn't that the save code is simplified -- it's
>> that there's no state change on exception entry, so you don't need to worry
>> about restoring state correctly on the way out or during a context switch.
>> And you can enable/disable the whole thing just by writing to the IDT, so
>> there's no performance hit at all in the disabled case.
>
> Concerning performances: if it's really matter, the better is to disable the CONFIG switch.
> But if we enable it, it's for using it I guess, and in that case, bypassing UserSpace page faults is better.
> You're proposal of "unconditionally doing the rdmsrs" is not good in that case.
> The only small gain is when CONFIG is enable and feature is disabled by cmdline:
> - with my proposal, we get 1 test and 1 jmp more (if I switch Kernel test with LBR state test): for an exception treatment, does it really matter?
>
> We can mix our proposals: keep my STOP/START code, and replace the dynamic disabling test by IDT change.
> I hope the code will stay readable.
> Do we really want to save 2 instructions?
I don't really care about the number of instructions. But there are
still all the nasty cases:
- Context switch during exception processing (both in the C handler
and in the retint code).
- PMI during exception processing.
- Exception while perf is poking at LBR msrs.
Where are you planning on saving the start/stop previous state?
--Andy
>
> Thanks,
>
> Emmanuel.
--
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists