[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <5609B7C0.3010807@ezchip.com>
Date: Mon, 28 Sep 2015 17:57:20 -0400
From: Chris Metcalf <cmetcalf@...hip.com>
To: Andy Lutomirski <luto@...capital.net>
CC: Gilad Ben Yossef <giladb@...hip.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...hat.com>, Tejun Heo <tj@...nel.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Christoph Lameter <cl@...ux.com>,
Viresh Kumar <viresh.kumar@...aro.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will.deacon@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>
Subject: Re: [PATCH v7 07/11] arch/x86: enable task isolation functionality
On 09/28/2015 04:59 PM, Andy Lutomirski wrote:
> On Mon, Sep 28, 2015 at 11:17 AM, Chris Metcalf <cmetcalf@...hip.com> wrote:
>> In prepare_exit_to_usermode(), we would like to call
>> task_isolation_enter() on every return to userspace, and like
>> other work items, we would like to recheck for more work after
>> calling it, since it will enable interrupts internally.
>>
>> However, if task_isolation_enter() is the only work item,
>> and it has already been called once, we don't want to continue
>> calling it in a loop. We don't have a dedicated TIF flag for
>> task isolation, and it wouldn't make sense to have one, since
>> we'd want to set it before starting exit every time, and then
>> clear it the first time around the loop.
>>
>> Instead, we change the loop structure somewhat, so that we
>> have a more inclusive set of flags that are tested for on the
>> first entry to the function (including TIF_NOHZ), and if any
>> of those flags are set, we enter the loop. And, we do the
>> task_isolation() test unconditionally at the bottom of the loop,
>> but then when making the decision to loop back, we just use the
>> set of flags that doesn't include TIF_NOHZ. That way we only
>> loop if there is other work to do, but then if that work
>> is done, we again unconditionally call task_isolation_enter().
>>
>> In syscall_trace_enter_phase1(), we try to add the necessary
>> support for strict-mode detection of syscalls in an optimized
>> way, by letting the code remain unchanged if we are not using
>> TASK_ISOLATION, but otherwise calling enter_from_user_mode()
>> under the first time we see _TIF_NOHZ, and then waiting until
>> after we do the secure computing work to actually clear the bit
>> from the "work" variable and call task_isolation_syscall().
>>
>> Signed-off-by: Chris Metcalf <cmetcalf@...hip.com>
>> ---
>> arch/x86/entry/common.c | 47 ++++++++++++++++++++++++++++++++++++-----------
>> 1 file changed, 36 insertions(+), 11 deletions(-)
>>
>> diff --git a/arch/x86/entry/common.c b/arch/x86/entry/common.c
>> index 80dcc9261ca3..0f74389c6f3b 100644
>> --- a/arch/x86/entry/common.c
>> +++ b/arch/x86/entry/common.c
>> @@ -21,6 +21,7 @@
>> #include <linux/context_tracking.h>
>> #include <linux/user-return-notifier.h>
>> #include <linux/uprobes.h>
>> +#include <linux/isolation.h>
>>
>> #include <asm/desc.h>
>> #include <asm/traps.h>
>> @@ -81,7 +82,8 @@ unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
>> */
>> if (work & _TIF_NOHZ) {
>> enter_from_user_mode();
>> - work &= ~_TIF_NOHZ;
>> + if (!IS_ENABLED(CONFIG_TASK_ISOLATION))
>> + work &= ~_TIF_NOHZ;
>> }
>> #endif
>>
>> @@ -131,6 +133,13 @@ unsigned long syscall_trace_enter_phase1(struct pt_regs *regs, u32 arch)
>> }
>> #endif
>>
>> + /* Now check task isolation, if needed. */
>> + if (IS_ENABLED(CONFIG_TASK_ISOLATION) && (work & _TIF_NOHZ)) {
>> + work &= ~_TIF_NOHZ;
>> + if (task_isolation_strict())
>> + task_isolation_syscall(regs->orig_ax);
>> + }
>> +
> This is IMO rather nasty. Can you try to find a way to do this
> without making the control flow depend on config options?
Well, I suppose this is the best argument for testing for task
isolation before seccomp :-)
Honestly, if not, it's tricky to see how to do better; I did spend
some time looking at it. One possibility is to just unconditionally
clear _TIF_NOHZ before testing "work == 0", so that we can
test (work & TIF_NOHZ) once early and once after seccomp.
This presumably costs a cycle in the no-nohz-full case.
So maybe just do it before seccomp...
> What guarantees that TIF_NOHZ is an acceptable thing to check?
Well, TIF_NOHZ is set on all tasks whenever we are running with
nohz_full enabled anywhere, so testing it lets us do stuff on
the fastpath without slowing down the fastpath much.
See context_tracking_cpu_set().
>> /* Do our best to finish without phase 2. */
>> if (work == 0)
>> return ret; /* seccomp and/or nohz only (ret == 0 here) */
>> @@ -217,10 +226,26 @@ static struct thread_info *pt_regs_to_thread_info(struct pt_regs *regs)
>> /* Called with IRQs disabled. */
>> __visible void prepare_exit_to_usermode(struct pt_regs *regs)
>> {
>> + u32 cached_flags;
>> +
>> if (WARN_ON(!irqs_disabled()))
>> local_irq_disable();
>>
>> /*
>> + * We may want to enter the loop here unconditionally to make
>> + * sure to do some work at least once. Test here for all
>> + * possible conditions that might make us enter the loop,
>> + * and return immediately if none of them are set.
>> + */
>> + cached_flags = READ_ONCE(pt_regs_to_thread_info(regs)->flags);
>> + if (!(cached_flags & (TIF_SIGPENDING | _TIF_NOTIFY_RESUME |
>> + _TIF_UPROBE | _TIF_NEED_RESCHED |
>> + _TIF_USER_RETURN_NOTIFY | _TIF_NOHZ))) {
>> + user_enter();
>> + return;
>> + }
>> +
> Too complicated and too error prone.
>
> In any event, I don't think that the property you actually want is for
> the loop to be entered once. I think the property you want is that
> we're isolated by the time we're finished. Why not just check that
> directly in the loop condition?
So something like this (roughly):
if (!(cached_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME |
_TIF_UPROBE | _TIF_NEED_RESCHED |
_TIF_USER_RETURN_NOTIFY)) &&
+ task_isolation_done())
break;
i.e. just add the one extra call? That could work, I suppose.
In the body we would then keep the proposed logic that unconditionally
calls task_isolation_enter().
>> + /*
>> * In order to return to user mode, we need to have IRQs off with
>> * none of _TIF_SIGPENDING, _TIF_NOTIFY_RESUME, _TIF_USER_RETURN_NOTIFY,
>> * _TIF_UPROBE, or _TIF_NEED_RESCHED set. Several of these flags
>> @@ -228,15 +253,7 @@ __visible void prepare_exit_to_usermode(struct pt_regs *regs)
>> * so we need to loop. Disabling preemption wouldn't help: doing the
>> * work to clear some of the flags can sleep.
>> */
>> - while (true) {
>> - u32 cached_flags =
>> - READ_ONCE(pt_regs_to_thread_info(regs)->flags);
>> -
>> - if (!(cached_flags & (_TIF_SIGPENDING | _TIF_NOTIFY_RESUME |
>> - _TIF_UPROBE | _TIF_NEED_RESCHED |
>> - _TIF_USER_RETURN_NOTIFY)))
>> - break;
>> -
>> + do {
>> /* We have work to do. */
>> local_irq_enable();
>>
>> @@ -258,9 +275,17 @@ __visible void prepare_exit_to_usermode(struct pt_regs *regs)
>> if (cached_flags & _TIF_USER_RETURN_NOTIFY)
>> fire_user_return_notifiers();
>>
>> + if (task_isolation_enabled())
>> + task_isolation_enter();
>> +
> Does anything here guarantee forward progress or at least give
> reasonable confidence that we'll make forward progress?
A given task can get stuck in the kernel if it has a lengthy far-future
alarm() type situation, or if there are multiple task-isolated tasks
scheduled onto the same core, but that only affects those tasks;
other tasks on the same core, and the system as a whole, are OK.
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists