linux-kernel - Re: [RFC][PATCH v2 11/11] context_tracking,x86: Fix text_poke_sync() vs NOHZ

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20211021201859.GX174703@worktop.programming.kicks-ass.net>
Date:   Thu, 21 Oct 2021 22:18:59 +0200
From:   Peter Zijlstra <peterz@...radead.org>
To:     Marcelo Tosatti <mtosatti@...hat.com>
Cc:     gor@...ux.ibm.com, jpoimboe@...hat.com, jikos@...nel.org,
        mbenes@...e.cz, pmladek@...e.com, mingo@...nel.org,
        linux-kernel@...r.kernel.org, joe.lawrence@...hat.com,
        fweisbec@...il.com, tglx@...utronix.de, hca@...ux.ibm.com,
        svens@...ux.ibm.com, sumanthk@...ux.ibm.com,
        live-patching@...r.kernel.org, paulmck@...nel.org,
        rostedt@...dmis.org, x86@...nel.org
Subject: Re: [RFC][PATCH v2 11/11] context_tracking,x86: Fix text_poke_sync()
 vs NOHZ_FULL

On Thu, Oct 21, 2021 at 04:57:09PM -0300, Marcelo Tosatti wrote:
> > Pretty much everything in noinstr is magical, we just have to think
> > harder there (and possibly start writing more comments there).
> 
> mds_user_clear_cpu_buffers happens after sync_core, in your patchset, 
> if i am not mistaken.

Of course it does, mds_user_clear_cpu_buffers() is on exit, the
sync_core() is on entry.

> > > > +             /* NMI happens here and must still do/finish CT_WORK_n */
> > > > +             sync_core();
> > > 
> > > But after the discussion with you, it seems doing the TLB checking 
> > > and (also sync_core) checking very late/very early on exit/entry 
> > > makes things easier to review.
> > 
> > I don't know about late, it must happen *very* early in entry. The
> > sync_core() must happen before any self-modifying code gets called
> > (static_branch, static_call, etc..) with possible exception of the
> > context_tracking static_branch.
> > 
> > The TLBi must also happen super early, possibly while still on the
> > entry stack (since the task stack is vmap'ed).
> 
> But will it be ever be freed/remapped from other CPUs while the task
> is running?

Probably not, still something we need to be really careful with.
> 
> > We currently don't run C
> > code on the entry stack, that needs quite a bit of careful work to make
> > happen.
> 
> Was thinking of coding in ASM after (as early as possible) the write to 
> switch to kernel CR3:

No, we're not going to add new feature to ASM. You'll just have to wait
until all that gets lifted to C.

>  Kernel entry:
>  -------------
> 
>        cpu = smp_processor_id();
> 
>        if (isolation_enabled(cpu)) {
>                reqs = atomic_xchg(&percpudata->user_kernel_state, IN_KERNEL_MODE);
>                if (reqs & CPU_REQ_FLUSH_TLB)
> 			flush_tlb_all();
>                if (reqs & CPU_REQ_SYNC_CORE)
> 			sync_core();
>        }                           
> 
> Exit to userspace (as close to write to CR3 with user pagetable
> pointer):
>  -----------------
> 
>        cpu = smp_processor_id();
> 
>        if (isolation_enabled(cpu)) {
>                atomic_or(IN_USER_MODE, &percpudata->user_kernel_state);
>        }
> 
> You think that is a bad idea (in ASM, not C) ? 

Those atomics are a bad idea and not goig to happen.

> > We're not going to add an atomic to context tracking. There is one, we
> > just got to extract/share it with RCU.
> 
> Again, to avoid kernel TLB flushes you'd have to ensure:

I know how it works, but we're not going to add a second atomic to
entry/exit. RCU has one in there, that's going to be it. Again, we just
got to extract/share.