linux-kernel - Re: Requirements to control kernel isolation/nohz

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20200907153417.GL1362448@hirez.programming.kicks-ass.net>
Date:   Mon, 7 Sep 2020 17:34:17 +0200
From:   peterz@...radead.org
To:     Frederic Weisbecker <frederic@...nel.org>
Cc:     Marcelo Tosatti <mtosatti@...hat.com>,
        Phil Auld <pauld@...hat.com>,
        Thomas Gleixner <tglx@...utronix.de>,
        "Paul E. McKenney" <paulmck@...nel.org>,
        Joel Fernandes <joel@...lfernandes.org>,
        linux-kernel@...r.kernel.org
Subject: Re: Requirements to control kernel isolation/nohz_full at runtime


(your mailer broke and forgot to keep lines shorter than 78 chars)

On Tue, Sep 01, 2020 at 12:46:41PM +0200, Frederic Weisbecker wrote:

> == TIF_NOHZ ==
> 
> Need to get rid of that in order not to trigger syscall slowpath on
> CPUs that don't want nohz_full.  Also we don't want to iterate all
> threads and clear the flag when the last nohz_full CPU exits nohz_full
> mode. Prefer static keys to call context tracking on archs. x86 does
> that well.

Build on the common entry code I suppose. Then any arch that uses that
gets to have the new features.

> == Proper entry code ==
> 
> We must make sure that a given arch never calls exception_enter() /
> exception_exit().  This saves the previous state of context tracking
> and switch to kernel mode (from context tracking POV) temporarily.
> Since this state is saved on the stack, this prevents us from turning
> off context tracking entirely on a CPU: The tracking must be done on
> all CPUs and that takes some cycles.
> 
> This means that, considering early entry code (before the call to
> context tracking upon kernel entry, and after the call to context
> tracking upon kernel exit), we must take care of few things:
> 
> 1) Make sure early entry code can't trigger exceptions. Or if it does,
> the given exception can't schedule or use RCU (unless it calls
> rcu_nmi_enter()). Otherwise the exception must call
> exception_enter()/exception_exit() which we don't want.

I think this is true for x86. Early entry has interrupts disabled, any
exception that can still happen is NMI-like and will thus use
rcu_nmi_enter().

On x86 that now includes #DB (which is also excluded due to us refusing
to set execution breakpoints on entry code), #BP, NMI and MCE.

> 2) No call to schedule_user().

I'm not sure what that is supposed to do, but x86 doesn't appear to have
it, so all good :-)

> 3) Make sure early entry code is not interruptible or
> preempt_schedule_irq() would rely on
> exception_entry()/exception_exit()

This is so for x86.

> 4) Make sure early entry code can't be traced (no call to
> preempt_schedule_notrace()), or if it does it can't schedule

noinstr is your friend.

> I believe x86 does most of that well.

It does now.