[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20190423111258.GA23410@gmail.com>
Date: Tue, 23 Apr 2019 13:12:58 +0200
From: Ingo Molnar <mingo@...nel.org>
To: Andy Lutomirski <luto@...nel.org>
Cc: Alexey Dobriyan <adobriyan@...il.com>,
"H. Peter Anvin" <hpa@...or.com>,
Thomas Gleixner <tglx@...utronix.de>,
Ingo Molnar <mingo@...hat.com>, Borislav Petkov <bp@...en8.de>,
LKML <linux-kernel@...r.kernel.org>, X86 ML <x86@...nel.org>,
Peter Zijlstra <a.p.zijlstra@...llo.nl>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Al Viro <viro@...iv.linux.org.uk>
Subject: Re: [PATCH] x86_64: uninline TASK_SIZE
* Andy Lutomirski <luto@...nel.org> wrote:
> > Saving 2KB on a defconfig is quite a lot.
>
> Saving 2kB of text by adding 8 bytes to thread_info seems rather
> dubious to me. You only need 256 tasks before you lose. My
> not-particularly-loaded laptop has 865 tasks right now.
I was suggesting current->task_size or thread_info->task_size as a way to
100% avoid the function call overhead. Worth a tiny amount of RAM - even
with 1 million tasks it's only 4MB of RAM. ;-)
Some TASK_SIZE users are prominent syscalls: mmap(),
> As a general principle, the mere existence of TIF_ADDR32 is a bug. The
> value of that flag is *wrong* under the 32-bit variant of CRIU. How
> about instead making some more progress toward getting rid of dubious
> TASK_SIZE users? I'm working on a little series to get rid of most of
> them. Meanwhile: it sure looks like a large fraction of the users are
> confused as to whether TASK_SIZE is the highest user address or the
> lowest non-user address.
I really like that, replacing TASK_SIZE with *nothing* would be even
faster.
In fact instead of just reducing its usage I'd suggest removing TASK_SIZE
altogether and renaming TASK_SIZE_MAX back to TASK_SIZE, or something
like that - the confusion from the deceptively macro-ish naming of
TASK_SIZE is real.
The original commit of making TASK_SIZE dynamic on the task's compat flag
was done in:
84929801e14d: [PATCH] x86_64: TASK_SIZE fixes for compatibility mode processes
Here's the justification given:
Appended patch will setup compatibility mode TASK_SIZE properly. This will
fix atleast three known bugs that can be encountered while running
compatibility mode apps.
a) A malicious 32bit app can have an elf section at 0xffffe000. During
exec of this app, we will have a memory leak as insert_vm_struct() is
not checking for return value in syscall32_setup_pages() and thus not
freeing the vma allocated for the vsyscall page. And instead of exec
failing (as it has addresses > TASK_SIZE), we were allowing it to
succeed previously.
b) With a 32bit app, hugetlb_get_unmapped_area/arch_get_unmapped_area
may return addresses beyond 32bits, ultimately causing corruption
because of wrap-around and resulting in SEGFAULT, instead of returning
ENOMEM.
c) 32bit app doing this below mmap will now fail.
mmap((void *)(0xFFFFE000UL), 0x10000UL, PROT_READ|PROT_WRITE,
MAP_FIXED|MAP_PRIVATE|MAP_ANON, 0, 0);
I believe a) is addressed by getting rid of the vsyscall page - but it
might also not be a current problem because the vsyscall page has its own
gate-vma now.
b) shouldn't be an issue if the mmap allocator correctly treats the
compat bit - this doesn't require generic TASK_SIZE variations though, as
the mmap allocator is already specific to arch/x86/.
c) is a variant of a) I believe, which should be fixed by now.
I just looked through some of the current TASK_SIZE users, and *ALL* of
them seem dubious to me, with the exception of the mmap allocators. In
fact some of them seem to be active bugs:
kernel/:
- PR_SET_MM_MAP, PR_SET_MM_MAP_SIZE, prctl_set_mm() et al. Ugh, what a
nasty prctl()! But besides that, the TASK_SIZE restriction to the ABI
looks questionable: if we offer this CRIU functionality then why
should it be impossible for a 32-bit CRIU task to switch to 64-bit?
- kernel/events/core.c: TASK_SIZE_MAX should be a fine filter here, in
fact it's probably *wrong* to restrict the profiling data here just
because the task happens to be in 32-bit compat mode.
- kernel/rseq.c: is this TASK_SIZE restriction even required, wouldn't
TASK_SIZE_MAX be sufficient?
mm/:
- GUP's get_get_area() et al looks really weird - why do we special-case
vsyscalls:
- Can we get rid of the vsyscall page in modern kernels?
- I don't think anyone runs those ancient glibc versions with a
fresh kernel anymore - can we start generating a WARN()ing
perhaps to see whether there's any complaints?
- Or at least pretend it doesn't exist in terms of a GUP target page?
- mm/kasan/generic_report.c:get_wild_bug_type() - this can use
TASK_SIZE_MAX just fine IMHO.
- mm/mempolicy.c:mpol_shared_policy_init() - unsure, but I suspect we
can just create the pseudo-vma with a TASK_SIZE_MAX vm_end just fine.
- mm/mlock.c:mlockall() - I believe it could be considered an outright
*bug* if there any pages outside the 32-bit area and don't get mlocked
by mlockall, just because this is a compat task. Especially with the
CRIU prctl() having 64-bit vmas outside the 32-bit mappings is a real
possibility, right? I.e. TASK_SIZE_MAX would be the right solution
here.
To turn the argument around: beyond the memory allocators, which includes
the mmap and huge-mmap variants plus the SysV shmem allocator, can we
list all the places that absolutely *rely* on TASK_SIZE being TIF_ADDR32
restricted on compat tasks? I couldn't find any.
So I concur 100% that most TASK_SIZE uses are questionable. In fact think
84929801e14d was a mistake, and we should effectively revert it
carefully, by:
- First by moving almost all TASK_SIZE users over to TASK_SIZE_MAX,
analyzing and justifying the impact case by case.
- Then making the mmap allocators compat compatible (ha) without relying
on TASK_SIZE.
- Renaming TASK_SIZE back to TASK_SIZE_MAX and getting rid of the
TASK_SIZE and TASK_SIZE_MAX differentiation.
Or am I missing some complication?
Thanks,
Ingo
Powered by blists - more mailing lists