[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20211021195709.GA22422@fuller.cnet>
Date: Thu, 21 Oct 2021 16:57:09 -0300
From: Marcelo Tosatti <mtosatti@...hat.com>
To: Peter Zijlstra <peterz@...radead.org>
Cc: gor@...ux.ibm.com, jpoimboe@...hat.com, jikos@...nel.org,
mbenes@...e.cz, pmladek@...e.com, mingo@...nel.org,
linux-kernel@...r.kernel.org, joe.lawrence@...hat.com,
fweisbec@...il.com, tglx@...utronix.de, hca@...ux.ibm.com,
svens@...ux.ibm.com, sumanthk@...ux.ibm.com,
live-patching@...r.kernel.org, paulmck@...nel.org,
rostedt@...dmis.org, x86@...nel.org
Subject: Re: [RFC][PATCH v2 11/11] context_tracking,x86: Fix text_poke_sync()
vs NOHZ_FULL
On Thu, Oct 21, 2021 at 09:25:43PM +0200, Peter Zijlstra wrote:
> On Thu, Oct 21, 2021 at 03:39:35PM -0300, Marcelo Tosatti wrote:
> > Peter,
> >
> > static __always_inline void arch_exit_to_user_mode(void)
> > {
> > mds_user_clear_cpu_buffers();
> > }
> >
> > /**
> > * mds_user_clear_cpu_buffers - Mitigation for MDS and TAA vulnerability
> > *
> > * Clear CPU buffers if the corresponding static key is enabled
> > */
> > static __always_inline void mds_user_clear_cpu_buffers(void)
> > {
> > if (static_branch_likely(&mds_user_clear))
> > mds_clear_cpu_buffers();
> > }
> >
> > We were discussing how to perform objtool style validation
> > that no code after the check for
>
> I'm not sure what the point of the above is... Were you trying to ask
> for validation that nothing runs after the mds_user_clear_cpu_buffer()?
>
> That isn't strictly true today, there's lockdep code after it. I can't
> recall why that order is as it is though.
>
> Pretty much everything in noinstr is magical, we just have to think
> harder there (and possibly start writing more comments there).
mds_user_clear_cpu_buffers happens after sync_core, in your patchset,
if i am not mistaken.
> > > + /* NMI happens here and must still do/finish CT_WORK_n */
> > > + sync_core();
> >
> > But after the discussion with you, it seems doing the TLB checking
> > and (also sync_core) checking very late/very early on exit/entry
> > makes things easier to review.
>
> I don't know about late, it must happen *very* early in entry. The
> sync_core() must happen before any self-modifying code gets called
> (static_branch, static_call, etc..) with possible exception of the
> context_tracking static_branch.
>
> The TLBi must also happen super early, possibly while still on the
> entry stack (since the task stack is vmap'ed).
But will it be ever be freed/remapped from other CPUs while the task
is running?
> We currently don't run C
> code on the entry stack, that needs quite a bit of careful work to make
> happen.
Was thinking of coding in ASM after (as early as possible) the write to
switch to kernel CR3:
Kernel entry:
-------------
cpu = smp_processor_id();
if (isolation_enabled(cpu)) {
reqs = atomic_xchg(&percpudata->user_kernel_state, IN_KERNEL_MODE);
if (reqs & CPU_REQ_FLUSH_TLB)
flush_tlb_all();
if (reqs & CPU_REQ_SYNC_CORE)
sync_core();
}
Exit to userspace (as close to write to CR3 with user pagetable
pointer):
-----------------
cpu = smp_processor_id();
if (isolation_enabled(cpu)) {
atomic_or(IN_USER_MODE, &percpudata->user_kernel_state);
}
You think that is a bad idea (in ASM, not C) ?
And request side can be in C:
Request side:
-------------
int targetcpu;
do {
struct percpudata *pcpudata = per_cpu(&percpudata, targetcpu);
old_state = pcpudata->user_kernel_state;
/* in kernel mode ? */
if (!(old_state & IN_USER_MODE)) {
smp_call_function_single(request_fn, targetcpu, 1);
break;
}
new_state = remote_state | CPU_REQ_FLUSH_TLB; // (or CPU_REQ_INV_ICACHE)
} while (atomic_cmpxchg(&pcpudata->user_kernel_state, old_state, new_state) != old_state);
(need logic to protect from atomic_cmpxchg always failing, but shouldnt
be difficult).
> > Can then use a single atomic variable with USER/KERNEL state and cmpxchg
> > loops.
>
> We're not going to add an atomic to context tracking. There is one, we
> just got to extract/share it with RCU.
Again, to avoid kernel TLB flushes you'd have to ensure:
kernel entry:
instrA addr1,addr2,addr3
instrB addr2,addr3,addr4 <--- that no address here has TLBs
modified and flushed
instrC addr5,addr6,addr7
reqs = atomic_xchg(&percpudata->user_kernel_state, IN_KERNEL_MODE);
if (reqs & CPU_REQ_FLUSH_TLB)
flush_tlb_all();
kernel exit:
atomic_or(IN_USER_MODE, &percpudata->user_kernel_state);
instrA addr1,addr2,addr3
instrB addr2,addr3,addr4 <--- that no address here has TLBs
modified and flushed
This could be conditional on "task isolated mode" enabled (would be
better if it didnt, though).
Powered by blists - more mailing lists