lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Mon, 17 Jul 2017 15:51:23 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Andy Lutomirski <luto@...nel.org>
Cc:     "security@...nel.org" <security@...nel.org>,
        Andrew Morton <akpm@...ux-foundation.org>,
        LKML <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH] exec: Check stack space more strictly

On Mon, Jul 17, 2017 at 3:22 PM, Andy Lutomirski <luto@...nel.org> wrote:
> We can currently blow past the stack rlimit and cause odd behavior
> if there are accounting bugs, rounding issues, or races.  It's not
> clear that the odd behavior is actually a problem, but it's nicer to
> fail the exec instead of getting out of sync with stack limits.
>
> Improve the code to more carefully check for space and to abort if
> our stack mm gets too large in setup_arg_pages().
>
> Signed-off-by: Andy Lutomirski <luto@...nel.org>

This is a nice check to have in case binfmts do insane things to the
vma before setup_arg_pages(). However, we've still got
create_elf_tables() which is a fixed size _except_ that it adds all
the argv/envp pointers. Those are being accounted for in the
get_arg_page() calls from copy_strings(), though.

Does anything use stack between copy_strings() and create_elf_tables()?

> ---
>  fs/exec.c | 44 ++++++++++++++++++++++++++++++++++----------
>  1 file changed, 34 insertions(+), 10 deletions(-)
>
> diff --git a/fs/exec.c b/fs/exec.c
> index 62175cbcc801..0c60c0495269 100644
> --- a/fs/exec.c
> +++ b/fs/exec.c
> @@ -764,23 +764,47 @@ int setup_arg_pages(struct linux_binprm *bprm,
>         /* mprotect_fixup is overkill to remove the temporary stack flags */
>         vma->vm_flags &= ~VM_STACK_INCOMPLETE_SETUP;
>
> -       stack_expand = 131072UL; /* randomly 32*4k (or 2*64k) pages */
> -       stack_size = vma->vm_end - vma->vm_start;
>         /*
>          * Align this down to a page boundary as expand_stack
>          * will align it up.
>          */
>         rlim_stack = rlimit(RLIMIT_STACK) & PAGE_MASK;
> +       stack_size = vma->vm_end - vma->vm_start;
> +
> +       if (stack_size > rlim_stack) {
> +               /*
> +                * If we've already used too much space (due to accounting
> +                * bugs, alignment, races, or any other cause), bail.
> +                */
> +               ret = -ENOMEM;
> +               goto out_unlock;
> +       }
> +
> +       /*
> +        * stack_expand is the amount of space beyond the space already used
> +        * that we're going to pre-allocate in our stack.  For historical
> +        * reasons, it's 128kB, unless we have less space than that available
> +        * in our rlimit.
> +        *
> +        * This particular historical wart is wrong-headed, though, since
> +        * we haven't finished binfmt-specific setup, and the binfmt code
> +        * is going to eat up some or all of this space.
> +        */
> +       stack_expand = min(rlim_stack - stack_size, 131072UL);

Yeah, this needs to account for argv/envp pointers, IMO. But it's
still a magic number not tied to the actual results of
create_elf_tables() (which is effectively fixed in stack usage except
for argv/envp pointers).

I think create_elf_tables() needs checking added, or the magic number
should be tied to the size of the AUX strings (platform,
base_platform, rand_bytes), and elf_info words.

Hmm, does anything use the stack _after_ create_elf_tables()? Could we
move create_elf_tables() right after setup_arg_pages() and then
perform the check?

> +
>  #ifdef CONFIG_STACK_GROWSUP
> -       if (stack_size + stack_expand > rlim_stack)
> -               stack_base = vma->vm_start + rlim_stack;
> -       else
> -               stack_base = vma->vm_end + stack_expand;
> +       if (TASK_SIZE_MAX - vma->vm_end < stack_expand) {
> +               ret = -ENOMEM;
> +               goto out_unlock;
> +       }
> +       stack_base = vma->vm_end + stack_expand;
>  #else
> -       if (stack_size + stack_expand > rlim_stack)
> -               stack_base = vma->vm_end - rlim_stack;
> -       else
> -               stack_base = vma->vm_start - stack_expand;
> +       if (vma->vm_start < mmap_min_addr ||
> +           vma->vm_start - mmap_min_addr < stack_expand) {
> +               ret = -ENOMEM;
> +               goto out_unlock;
> +       }
> +       stack_base = vma->vm_start - stack_expand;
>  #endif
>         current->mm->start_stack = bprm->p;
>         ret = expand_stack(vma, stack_base);
> --
> 2.9.4
>

FWIW, here are my current notes on the whole existing flow (though
note my other series which is trying to move the secureexec callback
earlier):

do_execvat_common:

check RLIMIT_NPROC
unshare_files() (restored on failure)
allocate bprm
prepare_bprm_creds() (allocate bprm->cred)
    mutex_lock_interruptible(&current->signal->cred_guard_mutex)
    prepare_exec_creds()
        prepare_creds()
            memcpy(new, old, ...)
            security_prepare_creds() (unimplemented by commoncap)
check_unsafe_exec()
do_open_execat()
sched_exec()
if closed fd, set BINPRM_FLAGS_PATH_INACCESSIBLE
bprm_mm_init() (allocate bprm->mm)
prepare_binprm()
    bprm_fill_uid() (fill bprm->cred)
        resets euid/egid to current euid/egid
        sets euid/egid on bprm based on set*id
    security_bprm_set_creds() (handles all euid etc logic)
    kernel_read(bprm->file, ...)
copy_strings() (argv)
    get_arg_page() (could fail with E2BIG for args size)
copy_strings() (envp)
    get_arg_page() (could fail with E2BIG for args size)
would_dump() (examine inode perms for MAY_READ)
    maybe set BINPRM_FLAGS_ENFORCE_NONDUMP
exec_binprm()
    search_binary_handler() (may go recursive for script and misc)
        security_bprm_check()
        for each format
            load_binary() (e.g. load_elf_binary)
                load_elf_phdrs()
                kernel_read(bprm->file, ..., elf_interpreter)
                would_dump() (interpreter)
                    maybe set BINPRM_FLAGS_ENFORCE_NONDUMP
                PT_GNU_STACK
                arch_check_elf()
                flush_old_exec()
                    de_thread()
                    exec_mmap()
                        current->mm = bprm->mm
                    bprm->mm = NULL; (failures after here will SEGV current)
                    flush_thread()
                current->personality
                setup_new_exec()
                    arch_pick_mmap_layout() (uses current RLIMIT_STACK)
                    dumpable (using current uid/euid?!)
                    arch_setup_new_exec()
                    perf_event_exec()
                    bprm->cred uid/euid
                        pdeath_signal = 0
                    dumpable (based on BINPRM_FLAGS_ENFORCE_NONDUMP)
                    current->self_exec_id++
                    flush_signal_handlers()
                install_exec_creds()
                    security_bprm_committing_creds()
                    commit_creds()
                    bprm->cred = NULL
                    if !dumpable
                        perf_event_exit_task()
                    security_bprm_committed_creds()
                    mutex_unlock(&current->signal->cred_guard_mutex)
                setup_arg_pages()
                    uses RLIMIT_STACK
                load ELF image (ET_EXEC vs ET_DYN, load_bias, etc)
                set_brk()
                load_elf_interp() (if elf_interpreter)
                create_elf_tables() (eats fixed stack except argv/envp ptrs)
                arch_randomize_brk()
                start_thread() (everything is up and running now)
            if (retval < 0 && !bprm->mm)
                SIGSEGV (can we send signals to my setuid parent?)
    audit_bprm()
    trace_sched_process_exec()
    ptrace_event(PTRACE_EVENT_EXEC)
    proc_exec_connector()
clean up


-Kees

-- 
Kees Cook
Pixel Security

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ