linux-kernel - Re: [PATCH v2] [LBR] Dump LBRs on Exception

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrWr=6BnQ7r4KYDVJzeNPO-pxk9wRmkkLy+mM1T_85Y=jw@mail.gmail.com>
Date:	Thu, 4 Dec 2014 10:09:53 -0800
From:	Andy Lutomirski <luto@...capital.net>
To:	"Berthier, Emmanuel" <emmanuel.berthier@...el.com>
Cc:	Thomas Gleixner <tglx@...utronix.de>,
	"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>,
	"Jarzmik, Robert" <robert.jarzmik@...el.com>,
	LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v2] [LBR] Dump LBRs on Exception

On Thu, Dec 4, 2014 at 8:01 AM, Berthier, Emmanuel
<emmanuel.berthier@...el.com> wrote:
>> From: Andy Lutomirski [mailto:luto@...capital.net]
>> Sent: Wednesday, December 3, 2014 8:30 PM
>> To: Berthier, Emmanuel
>> Cc: Thomas Gleixner; H. Peter Anvin; X86 ML; Jarzmik, Robert; LKML
>> Subject: Re: [PATCH v2] [LBR] Dump LBRs on Exception
>> > The final patch will bypass the new code in case of UserSpace page fault, so
>> performance impact will be very low.
>> > LBRs copy takes much more time than LBR stop/start.
>> >
>> > The simple is the better:
>> >
>> > .macro STOP_LBR
>> > #ifdef CONFIG_LBR_DUMP_ON_EXCEPTION
>> >         testl $3,CS(%rsp)               /* Kernel Space? */
>> >         jnz 1f
>> >         testl $3, PER_CPU_VAR(lbr_dump_state) /* Disabled? */
>> >         jnz 1f
>>
>> But that just wasted two of your LBR slots.
>
> No: false test does not generate Branch record, ex:
>
>   Last Branch Records:
>     to: [<ffffffff828122a0>] page_fault+0x0/0x90
>   from: [<ffffffff823c0e06>] sysrq_handle_crash+0x16/0x20
>     to: [<ffffffff823c0df0>] sysrq_handle_crash+0x0/0x20
>   from: [<ffffffff823c156c>] __handle_sysrq+0x9c/0x170
>     to: [<ffffffff823c1562>] __handle_sysrq+0x92/0x170
>
>> >         push %rax
>> >         push %rcx
>> >         push %rdx
>> >         movl $MSR_IA32_DEBUGCTLMSR, %ecx
>> >         rdmsr
>> >         and $~1, %eax   /* Disable LBR recording */
>> >         wrmsr
>> >         pop %rdx
>> >         pop %rcx
>> >         pop %rax
>>
>> And the general problem with this approach (even ignoring the performance
>> hit, and kernel faults on user addresses really do happen in real workloads) is
>> that you're not saving and restoring MSR_IA32_DEBUGCTL.
>> It may be that
>> the rest of your patch does whatever magic is needed to make this work, but
>> from just this code it's not at all obvious that this is correct.
>
> The algorithm is quite simple:
> When I enter in Exception handler, I stop LBR recording, and dump its content later if needed.
> When I leave Exception Handler, I restart LBR recording.
> So, after the first exception, LBR in On.
> In case of nested Exceptions and crash, you're right, LBR will probably not be relevant.
> But your proposal does not solve this issue: If we save registers during 1rst exception, and then overwrite them during 2nd level,
> we will lose relevant info if crash is due to the 1rst exception.
>
>> Hence my suggestion for rdmsr -- if you're willing to enable this and take the
>> performance hit, you can simplify it a lot and save some branch slots by
>> unconditionally doing the rdmsrs if you've enabled the LBR tracing IDT entry.
>> The simplification from using rdmsr isn't that the save code is simplified -- it's
>> that there's no state change on exception entry, so you don't need to worry
>> about restoring state correctly on the way out or during a context switch.
>> And you can enable/disable the whole thing just by writing to the IDT, so
>> there's no performance hit at all in the disabled case.
>
> Concerning performances: if it's really matter, the better is to disable the CONFIG switch.
> But if we enable it, it's for using it I guess, and in that case, bypassing UserSpace page faults is better.
> You're proposal of "unconditionally doing the rdmsrs" is not good in that case.
> The only small gain is when CONFIG is enable and feature is disabled by cmdline:
> - with my proposal, we get 1 test and 1 jmp more (if I switch Kernel test with LBR state test): for an exception treatment, does it really matter?
>
> We can mix our proposals: keep my STOP/START code, and replace the dynamic disabling test by IDT change.
> I hope the code will stay readable.
> Do we really want to save 2 instructions?

I don't really care about the number of instructions.  But there are
still all the nasty cases:

 - Context switch during exception processing (both in the C handler
and in the retint code).
 - PMI during exception processing.
 - Exception while perf is poking at LBR msrs.

Where are you planning on saving the start/stop previous state?

--Andy

>
> Thanks,
>
> Emmanuel.



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/