lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Message-ID: <CALCETrWtK4G7-o=pQoeyzLFTpiR9giGy2jYTU10xt6S8FsMk+Q@mail.gmail.com> Date: Tue, 23 Apr 2019 10:21:59 -0700 From: Andy Lutomirski <luto@...nel.org> To: Ingo Molnar <mingo@...nel.org> Cc: Andy Lutomirski <luto@...nel.org>, Alexey Dobriyan <adobriyan@...il.com>, "H. Peter Anvin" <hpa@...or.com>, Thomas Gleixner <tglx@...utronix.de>, Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>, LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>, Peter Zijlstra <a.p.zijlstra@...llo.nl>, Linus Torvalds <torvalds@...ux-foundation.org>, Al Viro <viro@...iv.linux.org.uk> Subject: Re: [PATCH] x86_64: uninline TASK_SIZE On Tue, Apr 23, 2019 at 4:13 AM Ingo Molnar <mingo@...nel.org> wrote: > > > * Andy Lutomirski <luto@...nel.org> wrote: > > > > Saving 2KB on a defconfig is quite a lot. > > > > Saving 2kB of text by adding 8 bytes to thread_info seems rather > > dubious to me. You only need 256 tasks before you lose. My > > not-particularly-loaded laptop has 865 tasks right now. > > I was suggesting current->task_size or thread_info->task_size as a way to > 100% avoid the function call overhead. Worth a tiny amount of RAM - even > with 1 million tasks it's only 4MB of RAM. ;-) > > Some TASK_SIZE users are prominent syscalls: mmap(), > > > As a general principle, the mere existence of TIF_ADDR32 is a bug. The > > value of that flag is *wrong* under the 32-bit variant of CRIU. How > > about instead making some more progress toward getting rid of dubious > > TASK_SIZE users? I'm working on a little series to get rid of most of > > them. Meanwhile: it sure looks like a large fraction of the users are > > confused as to whether TASK_SIZE is the highest user address or the > > lowest non-user address. > > I really like that, replacing TASK_SIZE with *nothing* would be even > faster. > > In fact instead of just reducing its usage I'd suggest removing TASK_SIZE > altogether and renaming TASK_SIZE_MAX back to TASK_SIZE, or something > like that - the confusion from the deceptively macro-ish naming of > TASK_SIZE is real. > > The original commit of making TASK_SIZE dynamic on the task's compat flag > was done in: > > 84929801e14d: [PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes > > Here's the justification given: > > Appended patch will setup compatibility mode TASK_SIZE properly. This will > fix atleast three known bugs that can be encountered while running > compatibility mode apps. > > a) A malicious 32bit app can have an elf section at 0xffffe000. During > exec of this app, we will have a memory leak as insert_vm_struct() is > not checking for return value in syscall32_setup_pages() and thus not > freeing the vma allocated for the vsyscall page. And instead of exec > failing (as it has addresses > TASK_SIZE), we were allowing it to > succeed previously. > > b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area > may return addresses beyond 32bits, ultimately causing corruption > because of wrap-around and resulting in SEGFAULT, instead of returning > ENOMEM. > > c) 32bit app doing this below mmap will now fail. > > mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE, > MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0); > > I believe a) is addressed by getting rid of the vsyscall page - but it > might also not be a current problem because the vsyscall page has its own > gate-vma now. > I suspect that whatever issue this is predates my involvement in Linux :) The "gate" VMA, aka vsyscall, is at 0xffffffffff600000, and the vDSO setup code shouldn't anywhere near that fragile. Also, if this really a bug, we have it for the 64-bit case, too, and TASK_SIZE isn't going to help. > b) shouldn't be an issue if the mmap allocator correctly treats the > compat bit - this doesn't require generic TASK_SIZE variations though, as > the mmap allocator is already specific to arch/x86/. This was mostly fixed by the CRIU folks a while back, I think. > > c) is a variant of a) I believe, which should be fixed by now. > > I just looked through some of the current TASK_SIZE users, and *ALL* of > them seem dubious to me, with the exception of the mmap allocators. In Even the munmap one seems to be a bug. > fact some of them seem to be active bugs: > > kernel/: > > - PR_SET_MM_MAP, PR_SET_MM_MAP_SIZE, prctl_set_mm() et al. Ugh, what a > nasty prctl()! But besides that, the TASK_SIZE restriction to the ABI > looks questionable: if we offer this CRIU functionality then why > should it be impossible for a 32-bit CRIU task to switch to 64-bit? > > - kernel/events/core.c: TASK_SIZE_MAX should be a fine filter here, in > fact it's probably *wrong* to restrict the profiling data here just > because the task happens to be in 32-bit compat mode. Yep. I have a patch fo rthis. > > - kernel/rseq.c: is this TASK_SIZE restriction even required, wouldn't > TASK_SIZE_MAX be sufficient? Presumably. > So I concur 100% that most TASK_SIZE uses are questionable. In fact think > 84929801e14d was a mistake, and we should effectively revert it > carefully, by: > > - First by moving almost all TASK_SIZE users over to TASK_SIZE_MAX, > analyzing and justifying the impact case by case. > > - Then making the mmap allocators compat compatible (ha) without relying > on TASK_SIZE. I have a somewhat hacky patch for this right now. > > - Renaming TASK_SIZE back to TASK_SIZE_MAX and getting rid of the > TASK_SIZE and TASK_SIZE_MAX differentiation. > > Or am I missing some complication? Seems like a great idea to me. BTW, what the heck is up with get_gate_page()? I'm struggling to understand what it's even trying to do. If there's an architecture that allows a user program to mremap() or otherwise position its gate VMA between TASK_SIZE and TASK_SIZE_MAX, then that code is going to explode horribly. A whole bunch of work in this direction is here: https://git.kernel.org/pub/scm/linux/kernel/git/luto/linux.git/log/?h=x86/fixes It's almost entirely untested.
Powered by blists - more mailing lists