linux-kernel - Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrVQCi_RZqRSTy9bs0V+RB6cLHVfYq4Ouq_JLMoJePg1zA@mail.gmail.com>
Date:	Tue, 18 Aug 2015 15:35:30 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Frederic Weisbecker <fweisbec@...il.com>
Cc:	Andy Lutomirski <luto@...nel.org>, X86 ML <x86@...nel.org>,
	Sasha Levin <sasha.levin@...cle.com>,
	Brian Gerst <brgerst@...il.com>,
	Denys Vlasenko <dvlasenk@...hat.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	Oleg Nesterov <oleg@...hat.com>,
	Borislav Petkov <bp@...en8.de>, Rik van Riel <riel@...hat.com>
Subject: Re: [PATCH] x86/entry/64: Context-track syscalls before enabling interrupts

On Tue, Aug 18, 2015 at 3:16 PM, Frederic Weisbecker <fweisbec@...il.com> wrote:
> On Tue, Aug 18, 2015 at 12:11:59PM -0700, Andy Lutomirski wrote:
>> This fixes a couple minor holes if we took an IRQ very early in syscall
>> processing:
>>
>>  - We could enter the IRQ with CONTEXT_USER.  Everything worked (RCU
>>    was fine), but we could warn if all the debugging options were
>>    set.
>
> So this is fixing issues after your changes that call user_exit() from
> IRQs, right?

Yes.  Here's an example splat, courtesy of Sasha:

https://gist.github.com/sashalevin/a006a44989312f6835e7

>
> But the IRQs aren't supposed to call user_exit(), they have their own hooks.
> That's where the real issue is.

In -tip, the assumption is that we *always* switch to CONTEXT_KERNEL
when entering the kernel for a non-NMI reason.  That means that we can
avoid all of the (expensive!) checks for what context we're in.  It
also means that (other than IRQs, which need further cleanup), we only
switch once per user/kernel switch.

The cost for doing should be essentially zero, modulo artifacts from
poor inlining.  IMO the code is much more straightforward than it used
to be, and it has the potential to be quite fast.  For one thing, we
never invoke context tracking with IRQs on, and Rik had some profiles
suggesting that a bunch of the overhead involved dealing with repeated
irq flag manipulation.

One way or another, IRQs need to switch from RCU-not-watching to
RCU-watching, and I don't see what's wrong with user_exit for this
purpose.  Of course, if user_exit is slow, we should fix that.

Also, this isn't really related to IRQs calling user_exit.  It's that
IRQs can recurse into other entries (#GP in Sasha's case) which also
validate the context.

None of the speedups that will be enabled are written yet, but I
strongly suspect they will be soon :)

In my book, the fact that we now have context tracking assertions all
over the place is a good thing.  It means we're much less likely to
break it.

--Andy

-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/