lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUxiT5Y9GMf0pRFXoh2wMACMdGbbBeT2zoy38idS-fC5g@mail.gmail.com>
Date:	Wed, 20 Apr 2016 08:40:23 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Dmitry Safonov <dsafonov@...tuozzo.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Shuah Khan <shuahkh@....samsung.com>,
	Ingo Molnar <mingo@...hat.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	Borislav Petkov <bp@...en8.de>, khorenko@...tuozzo.com,
	X86 ML <x86@...nel.org>,
	Andrew Morton <akpm@...ux-foundation.org>, xemul@...tuozzo.com,
	linux-kselftest@...r.kernel.org,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	Dmitry Safonov <0x7f454c46@...il.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to
 change compatible mode

On Wed, Apr 20, 2016 at 4:04 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Thu, Apr 14, 2016 at 11:27:35AM -0700, Andy Lutomirski wrote:
>> On Wed, Apr 13, 2016 at 9:55 AM, Dmitry Safonov <dsafonov@...tuozzo.com> wrote:
>> > On 04/08/2016 11:44 PM, Andy Lutomirski wrote:
>> >>
>> >> Feel free to ask for help on some of these details.  user_64bit_mode
>> >> will be helpful too.
>> >
>> > Hello again,
>> >
>> > here are some questions on  TIF_IA32 removal:
>> > - in function intel_pmu_pebs_fixup_ip: there is need to
>> > know if process was it native/compat mode for instruction
>> > interpreter for IP + one instruction fixup. There are
>> > registers, but they are from PEBS, which does not contain
>> > segment descriptors (even for PEBSv3). Other values
>> > are from interrupt regs (look at setup_pebs_sample_data).
>> > So, I guess, we may use user_64bit_mode on interrupt
>> > register set, which will be racy with changing task's mode,
>> > but quite ok?
>>
>> Here's my understanding:
>>
>> We don't actually know the mode, and there's no way we could get it
>> exactly.  User code could have changed the mode between when the PEBS
>> event was written and when we got the interrupt, and there's no way
>> for us to tell.
>>
>> The regs passed to the interrupt aren't particularly helpful -- if we
>> get the overflow event from kernel mode, the regs will be kernel regs,
>> not user regs.
>>
>> What we can do is to the the regs returned by perf_get_regs_user,
>> which I imagine perf is already doing.  Peter, is this the case?
>
> *confused*, how is perf_get_regs_user() connected to the PEBS fixup?
>
> Ah, you want to use perf_get_regs_user() instead of task_pt_regs()
> because of how an NMI during interrupt entry would mess up the
> task_pt_regs() contents.
>
> At that point you can use regs_user->abi, right?

Yes, exactly.

Do LBR, PEBS, and similar report user regs or do they merely want to
know the instruction format?  If the latter, I could whip up a tiny
function to do just that (like perf_get_regs_user but just for ABI --
it would be simpler).

[merging some emails]

>> Peter, I got lost in the code that calls this.  Are regs coming from
>> the overflow interrupt's regs, current_pt_regs(), or
>> perf_get_regs_user?
>
> So get_perf_callchain() will get regs from:
>
>  - interrupt/NMI regs
>  - perf_arch_fetch_caller_regs()
>
> And when user && !user_mode(), we'll use:
>
>  - task_pt_regs() (which arguably should maybe be perf_get_regs_user())

Could you point me to this bit of the code?

>
> to call perf_callchain_user(), which then, ands up calling
> perf_callchain_user32() which is expected to NO-OP for 64bit userspace.
>
>> If it's the perf_get_regs_user, then this should be okay, but passing
>> in the ABI field directly would be even nicer.  If they're coming from
>> the overflow interrupt's regs or current_pt_regs(), could we change
>> that?
>>
>> It might also be nice to make sure that we call perf_get_regs_user
>> exactly once per overflow interrupt -- i.e. we could push it into the
>> main code rather than the regs sampling code.
>
> The risk there is that we might not need the user regs at all to handle
> the overflow thingy, so doing it unconditionally would be unwanted.

One call to perf_get_user_regs per interrupt shouldn't be too bad --
certainly much better then one per PEBS record.  One call to get user
ABI per overflow would be even less bad, but at that point, folding it
in to the PEBS code wouldn't be so bad either.

If I'm understanding this right (a big, big if), if we get a PEBS
overflow while running in user mode, we'll dump out the user regs (and
call perf_get_regs_user) and all the PEBS entries (subject to
exclude_kernel and with all the decoding magic).  So, in that case, we
call perf_get_user_regs.

If we get a PEBS overflow while running in kernel mode, we'll report
the kernel regs (if !exclude_kernel) and report the PEBS data as well.
If any of those records are in user mode, then, ideally, we'd invoke
perf_get_regs_user or similar *once* to get the ABI.  Although, if we
can get the user ABI efficiently enough, then maybe we don't care if
we call it once per PEBS record.

On x86, the only weird cases are NMIs or MCEs that land in the
syscall, syscall32, and sysenter prologues (easy to handle fully
correctly if we care because the IP that we interrupted tells us the
ABI) and the bullshit SYSENTER+TF thing.  Even the latter isn't so
hard to get right.

--Andy

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ