[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150225092043.GB16165@gmail.com>
Date: Wed, 25 Feb 2015 10:20:43 +0100
From: Ingo Molnar <mingo@...nel.org>
To: "H. Peter Anvin" <hpa@...or.com>
Cc: Andy Lutomirski <luto@...capital.net>,
Denys Vlasenko <dvlasenk@...hat.com>,
Linus Torvalds <torvalds@...ux-foundation.org>,
Steven Rostedt <rostedt@...dmis.org>,
Borislav Petkov <bp@...en8.de>,
Oleg Nesterov <oleg@...hat.com>,
Frederic Weisbecker <fweisbec@...il.com>,
Alexei Starovoitov <ast@...mgrid.com>,
Will Drewry <wad@...omium.org>,
Kees Cook <keescook@...omium.org>, X86 ML <x86@...nel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 1/4] x86: entry.S: tidy up several suboptimal insns
* H. Peter Anvin <hpa@...or.com> wrote:
> On 02/24/2015 02:25 PM, Andy Lutomirski wrote:
> > On Tue, Feb 24, 2015 at 10:51 AM, Denys Vlasenko <dvlasenk@...hat.com> wrote:
> >>
> >> In all three 32-bit entry points, %eax is
> >> zero-extended to %rax. It is safe to do 32-bit compare
> >> when checking that syscall# is not too large.
> >
> > Applied. Thanks!
> >
>
> NAK NAK NAK NAK NAK!!!!
>
> We have already had this turn into a security issue not
> just once but TWICE, because someone decided to
> "optimize" the path by taking out the zero extend.
>
> The use of a 64-bit compare here is an intentional "belts
> and suspenders" safety issue.
I think the fundamental fragility is that we allow the high
32 bits to be nonzero.
So could we just zap the high 32 bits of RAX early in the
entry code, and then from that point on we could both use
32-bit ops and won't have to remember the possibility
either?
That's arguably one more (cheap) instruction in the 32-bit
entry paths but then we could use the shorter 32-bit
instructions for compares and other uses and could always
be certain that we get what we want.
But, if we do that, we can do even better, and also do an
optimization of the 64-bit entry path as well: we could
simply mask RAX with 0x3ff and not do a compare. Pad the
syscall table up to 0x400 (1024) entries and fill in the
table with sys_ni syscall entries.
This is valid on 64-bit and 32-bit kernels as well, and it
allows the removal of a compare from the syscall entry
path, at the cost of a couple of kilobytes of unused
syscall table.
The downside would be that if we ever grow past 1024
syscall entries we'll be in trouble if new userspace calls
syscall 513 on an old kernel and gets syscall 1.
I doubt we'll ever get so many syscalls, and user-space
will be able to be smart in any case, so it's not a
showstopper.
This is the safest and quickest implementation as well.
Thoughts?
Thanks,
Ingo
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists