lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:   Thu, 1 Aug 2019 14:39:51 -0700
From:   Andy Lutomirski <luto@...nel.org>
To:     "H. Peter Anvin" <hpa@...or.com>
Cc:     Andy Lutomirski <luto@...nel.org>,
        Dmitry Safonov <dima@...sta.com>,
        LKML <linux-kernel@...r.kernel.org>,
        Dmitry Safonov <0x7f454c46@...il.com>,
        Adrian Reber <adrian@...as.de>,
        Andrei Vagin <avagin@...nvz.org>,
        Arnd Bergmann <arnd@...db.de>,
        Christian Brauner <christian.brauner@...ntu.com>,
        Cyrill Gorcunov <gorcunov@...nvz.org>,
        "Eric W. Biederman" <ebiederm@...ssion.com>,
        Ingo Molnar <mingo@...hat.com>, Jann Horn <jannh@...gle.com>,
        Jeff Dike <jdike@...toit.com>, Oleg Nesterov <oleg@...hat.com>,
        Pavel Emelyanov <xemul@...tuozzo.com>,
        Shuah Khan <shuah@...nel.org>,
        Thomas Gleixner <tglx@...utronix.de>,
        Vincenzo Frascino <vincenzo.frascino@....com>,
        Linux Containers <containers@...ts.linux-foundation.org>,
        criu@...nvz.org, Linux API <linux-api@...r.kernel.org>,
        X86 ML <x86@...nel.org>, Andrei Vagin <avagin@...il.com>
Subject: Re: [PATCHv5 25/37] x86/vdso: Switch image on setns()/clone()

On Wed, Jul 31, 2019 at 11:09 PM <hpa@...or.com> wrote:
>
> On July 31, 2019 10:34:26 PM PDT, Andy Lutomirski <luto@...nel.org> wrote:
> >On Mon, Jul 29, 2019 at 2:58 PM Dmitry Safonov <dima@...sta.com> wrote:
> >>
> >> As it has been discussed on timens RFC, adding a new conditional
> >branch
> >> `if (inside_time_ns)` on VDSO for all processes is undesirable.
> >> It will add a penalty for everybody as branch predictor may
> >mispredict
> >> the jump. Also there are instruction cache lines wasted on cmp/jmp.
> >
> >
> >>
> >> +#ifdef CONFIG_TIME_NS
> >> +int vdso_join_timens(struct task_struct *task)
> >> +{
> >> +       struct mm_struct *mm = task->mm;
> >> +       struct vm_area_struct *vma;
> >> +
> >> +       if (down_write_killable(&mm->mmap_sem))
> >> +               return -EINTR;
> >> +
> >> +       for (vma = mm->mmap; vma; vma = vma->vm_next) {
> >> +               unsigned long size = vma->vm_end - vma->vm_start;
> >> +
> >> +               if (vma_is_special_mapping(vma, &vvar_mapping) ||
> >> +                   vma_is_special_mapping(vma, &vdso_mapping))
> >> +                       zap_page_range(vma, vma->vm_start, size);
> >> +       }
> >
> >This is, unfortunately, fundamentally buggy.  If any thread is in the
> >vDSO or has the vDSO on the stack (due to a signal, for example), this
> >will crash it.  I can think of three solutions:
> >
> >1. Say that you can't setns() if you have other mms and ignore the
> >signal issue.  Anything with green threads will disapprove.  It's also
> >rather gross.
> >
> >2. Make it so that you can flip the static branch safely.  As in my
> >other email, you'll need to deal with CoW somehow,
> >
> >3. Make it so that you can't change timens, or at least that you can't
> >turn timens on or off, without execve() or fork().
> >
> >BTW, that static branch probably needs to be aligned to a cache line
> >or something similar to avoid all the nastiness with trying to poke
> >text that might be concurrently executing.  This will be a mess.
>
> Since we are talking about different physical addresses I believe we should be okay as long as they don't cross page boundaries, and even if they do it can be managed with proper page invalidation sequencing – it's not like the problems of having to deal with XMC on live pages like in the kernel.
>
> Still, you really need each instruction sequence to be present, with the only difference being specific patch sites.
>
> Any fundamental reason this can't be strictly data driven? Seems odd to me if it couldn't, but I might be missing something obvious.

I think it can be.  There are at least two places where vDSO slow
paths could hook without affecting fast paths: vclock_mode and the low
bit of the sequence number.

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ