lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite for Android: free password hash cracker in your pocket
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <57067A4F.9090101@virtuozzo.com>
Date:	Thu, 7 Apr 2016 18:18:39 +0300
From:	Dmitry Safonov <dsafonov@...tuozzo.com>
To:	Andy Lutomirski <luto@...capital.net>
CC:	Thomas Gleixner <tglx@...utronix.de>,
	Dmitry Safonov <0x7f454c46@...il.com>,
	Dave Hansen <dave.hansen@...ux.intel.com>,
	Ingo Molnar <mingo@...hat.com>,
	Shuah Khan <shuahkh@....samsung.com>,
	Borislav Petkov <bp@...en8.de>, X86 ML <x86@...nel.org>,
	<khorenko@...tuozzo.com>,
	Andrew Morton <akpm@...ux-foundation.org>,
	<xemul@...tuozzo.com>, <linux-kselftest@...r.kernel.org>,
	Cyrill Gorcunov <gorcunov@...nvz.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"H. Peter Anvin" <hpa@...or.com>
Subject: Re: [PATCH 1/2] x86/arch_prctl: add ARCH_SET_{COMPAT,NATIVE} to
 change compatible mode

On 04/07/2016 05:39 PM, Andy Lutomirski wrote:
> On Apr 7, 2016 5:12 AM, "Dmitry Safonov" <dsafonov@...tuozzo.com> wrote:
>> On 04/06/2016 09:04 PM, Andy Lutomirski wrote:
>>> [cc Dave Hansen for MPX]
>>>
>>> On Apr 6, 2016 9:30 AM, "Dmitry Safonov" <dsafonov@...tuozzo.com> wrote:
>>>> Now each process that runs natively on x86_64 may execute 32-bit code
>>>> by proper setting it's CS selector: either from LDT or reuse Linux's
>>>> USER32_CS. The vice-versa is also valid: running 64-bit code in
>>>> compatible task is also possible by choosing USER_CS.
>>>> So we may switch between 32 and 64 bit code execution in any process.
>>>> Linux will choose the right syscall numbers in entries for those
>>>> processes. But it still will consider them native/compat by the
>>>> personality, that elf loader set on launch. This affects i.e., ptrace
>>>> syscall on those tasks: PTRACE_GETREGSET will return 64/32-bit regset
>>>> according to process's mode (that's how strace detect task's
>>>> personality from 4.8 version).
>>>>
>>>> This patch adds arch_prctl calls for x86 that make possible to tell
>>>> Linux kernel in which mode the application is running currently.
>>>> Mainly, this is needed for CRIU: restoring compatible & native
>>>> applications both from 64-bit restorer. By that reason I wrapped all
>>>> the code in CONFIG_CHECKPOINT_RESTORE.
>>>> This patch solves also a problem for running 64-bit code in 32-bit elf
>>>> (and reverse), that you have only 32-bit elf vdso for fast syscalls.
>>>> When switching between native <-> compat mode by arch_prctl, it will
>>>> remap needed vdso binary blob for target mode.
>>> General comments first:
>> Thanks for your comments.
>>> You forgot about x32.
>> Will add x32 support for v2.
>>
>>> I think that you should separate vdso remapping from "personality".
>>> vdso remapping should be available even on native 32-bit builds, which
>>> means that either you can't use arch_prctl for it or you'll have to
>>> wire up arch_prctl as a 32-bit syscall.
>> I cant say, I got your point. Do you mean by vdso remapping
>> mremap for vdso/vvar pages? I think, it should work now.
> For 32-bit, the vdso *must* exist in memory at the address that the
> kernel thinks it's at.  Even if you had a pure 32-bit restore stub,
> you would still need vdso remap, because there's a chance the vdso
> could land at an unusable address, say one page off from where you
> want it.  You couldn't map a wrapper because there wouldn't be any
> space for it without moving the real vdso out of the way.
>
> Remember, you *cannot* mremap() the 32-bit vdso because you will
> crash.  It works by luck for 64-bit, but it's plausible that we'd want
> to change that some day.  (I have awful patches that speed a bunch of
> things up at the cost of a vdso trampoline for 64-bit code and a bunch
> of other hacks.  Those patches will never go in for real, but
> something else might want the ability to use 64-bit vdso trampolines.)
Thanks for the elaboration, now I see. Signals and fast syscalls
expect mm->context.vdso to be correct.
>
>> I did remapping for vdso as blob for native x86_64 task differs
>> to compatible task. So it's just changing blobs, address value
>> is there for convenience - I may omit it and just remap
>> different vdso blob at the same place where was previous vdso.
>> I'm not sure, why do we need possibility to map 64-bit vdso blob
>> on native 32-bit builds?
> That would fail, but I think the API should exist.  But a native
> 32-bit program should be able to remap the 32-bit vdso.
>
> IOW, I think you should be able to do, roughly:
>
> map_new_vdso(VDSO_32BIT, addr);
>
> on any kernel.
>
> Am I making sense?
Yes. I will rework it for some API.
>
>>> For "personality", someone needs to enumerate all of the various thigs
>>> that try to track bitness and see how many of them even make sense.
>>> On brief inspection:
>>>
>>>    - TIF_IA32: affects signal format and does something to ptrace.  I
>>> suspect that whatever it does to ptrace is nonsensical, and I don't
>>> know whether we're stuck with it.
>>>
>>>    - TIF_ADDR32 affects TASK_SIZE and mmap behavior (and the latter
>>> isn't even done in a sensible way).
>>>
>>>    - is_64bit_mm affects MPX and uprobes.
>>>
>>> On even more brief inspection:
>>>
>>>    - uprobes using is_64bit_mm is buggy.
>>>
>>>    - I doubt that having TASK_SIZE vary serves any purpose.  Does anyone
>>> know why TASK_SIZE is different for different tasks?  It would save
>>> code size and speed things up if TASK_SIZE were always TASK_SIZE_MAX.
>>>    - Using TIF_IA32 for signal processing is IMO suboptimal.  Instead,
>>> we should record which syscall installed the signal handler and use
>>> the corresponding frame format.
>> Oh, I like it, will do.
>>
>>>    - Using TIF_IA32 of the *target* for ptrace is nonsense.  Having
>>> strace figure out syscall type using that is actively buggy, and I ran
>>> into that bug a few days ago and cursed at it.  strace should inspect
>>> TS_COMPAT (I don't know how, but that's what should happen).  We may
>>> be stuck with this for ABI reasons.
>> ptrace may check seg_32bit for code selector, what do you think?
> Not sure.  I have never fully wrapped my had around ptrace.
Hm, I guess, it's better to check TS_COMPAT, after some thinking:
It's set up on compatible syscall enter, so there is no need to
check seg_32bit anyway.

Huge thanks, will work on v2 according to your comments.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ