lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <201907221012.41504DCD@keescook>
Date:   Mon, 22 Jul 2019 10:16:14 -0700
From:   Kees Cook <keescook@...omium.org>
To:     Andy Lutomirski <luto@...capital.net>
Cc:     Sean Christopherson <sean.j.christopherson@...el.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        Andy Lutomirski <luto@...nel.org>,
        Vincenzo Frascino <vincenzo.frascino@....com>, x86@...nel.org,
        linux-kernel@...r.kernel.org
Subject: Re: [5.2 REGRESSION] Generic vDSO breaks seccomp-enabled userspace
 on i386

On Fri, Jul 19, 2019 at 01:40:13PM -0400, Andy Lutomirski wrote:
> > On Jul 19, 2019, at 1:03 PM, Sean Christopherson <sean.j.christopherson@...el.com> wrote:
> > 
> > The generic vDSO implementation, starting with commit
> > 
> >   7ac870747988 ("x86/vdso: Switch to generic vDSO implementation")
> > 
> > breaks seccomp-enabled userspace on 32-bit x86 (i386) kernels.  Prior to
> > the generic implementation, the x86 vDSO used identical code for both
> > x86_64 and i386 kernels, which worked because it did all calcuations using
> > structs with naturally sized variables, i.e. didn't use __kernel_timespec.
> > 
> > The generic vDSO does its internal calculations using __kernel_timespec,
> > which in turn requires the i386 fallback syscall to use the 64-bit
> > variation, __NR_clock_gettime64.
> 
> This is basically doomed to break eventually, right?

Just so I'm understanding: the vDSO change introduced code to make an
actual syscall on i386, which for most seccomp filters would be rejected?

> I’ve occasionally considered adding a concept of “seccomp aliases”.  The idea is that, if a filter returns anything other than ALLOW, we re-run it with a different nr that we dig out it a small list of such cases. This would be limited to cases where the new syscall does the same thing with the same arguments.

Would that help here? The kernel just sees this a direct syscall. I
guess it could whitelist it by checking the return address?

> I want this for restart_syscall: I want to renumber it.

Oh man, don't get me started on restart_syscall. Some architectures make
it invisible to seccomp and others don't. ugh.

-- 
Kees Cook

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ