linux-kernel - Re: [PATCH 1/3] sched: add sched_task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <20150219101607.GG5029@twins.programming.kicks-ass.net>
Date:	Thu, 19 Feb 2015 11:16:07 +0100
From:	Peter Zijlstra <peterz@...radead.org>
To:	Josh Poimboeuf <jpoimboe@...hat.com>
Cc:	Andrew Morton <akpm@...ux-foundation.org>,
	Ingo Molnar <mingo@...hat.com>, Jiri Kosina <jkosina@...e.cz>,
	Seth Jennings <sjenning@...hat.com>,
	linux-kernel@...r.kernel.org, Vojtech Pavlik <vojtech@...e.cz>
Subject: Re: [PATCH 1/3] sched: add sched_task_call()

On Wed, Feb 18, 2015 at 10:17:53PM -0600, Josh Poimboeuf wrote:
> On Thu, Feb 19, 2015 at 01:20:58AM +0100, Peter Zijlstra wrote:
> > On Wed, Feb 18, 2015 at 11:12:56AM -0600, Josh Poimboeuf wrote:

> > > The next line of attack is patching tasks when exiting the kernel to
> > > user space (system calls, interrupts, signals), to catch all CPU-bound
> > > and some I/O-bound tasks.  That's done in patch 9 [1] of the consistency
> > > model patch set.
> > 
> > So the HPC people are really into userspace that does for (;;) ; and
> > isolate that on CPUs and have the tick interrupt stopped and all that.
> > 
> > You'll not catch those threads on the sysexit path.
> > 
> > And I'm fairly sure they'll not want to SIGSTOP/CONT their stuff either.
> > 
> > Now its fairly easy to also handle this; just mark those tasks with a
> > _TIF_WORK_SYSCALL_ENTRY flag, have that slowpath wait for the flag to
> > go-away, then flip their state and clear the flag.
> 
> I guess you mean patch the task when it makes a syscall?  I'm doing that
> already on syscall exit with a bit in _TIF_ALLWORK_MASK and
> _TIF_DO_NOTIFY_MASK.

No, these tasks will _never_ make syscalls. So you need to guarantee
they don't accidentally enter the kernel while you flip them. Something
like so should do.

You set TIF_ENTER_WAIT on them, check they're still in userspace, flip
them then clear TIF_ENTER_WAIT.

---
 arch/x86/include/asm/thread_info.h | 4 +++-
 arch/x86/kernel/entry_64.S         | 2 ++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/x86/include/asm/thread_info.h b/arch/x86/include/asm/thread_info.h
index e82e95abc92b..baa836f13536 100644
--- a/arch/x86/include/asm/thread_info.h
+++ b/arch/x86/include/asm/thread_info.h
@@ -90,6 +90,7 @@ struct thread_info {
 #define TIF_SYSCALL_TRACEPOINT	28	/* syscall tracepoint instrumentation */
 #define TIF_ADDR32		29	/* 32-bit address space on 64 bits */
 #define TIF_X32			30	/* 32-bit native x86-64 binary */
+#define TIF_ENTER_WAIT		31
 
 #define _TIF_SYSCALL_TRACE	(1 << TIF_SYSCALL_TRACE)
 #define _TIF_NOTIFY_RESUME	(1 << TIF_NOTIFY_RESUME)
@@ -113,12 +114,13 @@ struct thread_info {
 #define _TIF_SYSCALL_TRACEPOINT	(1 << TIF_SYSCALL_TRACEPOINT)
 #define _TIF_ADDR32		(1 << TIF_ADDR32)
 #define _TIF_X32		(1 << TIF_X32)
+#define _TIF_ENTER_WAIT		(1 << TIF_ENTER_WAIT)
 
 /* work to do in syscall_trace_enter() */
 #define _TIF_WORK_SYSCALL_ENTRY	\
 	(_TIF_SYSCALL_TRACE | _TIF_SYSCALL_EMU | _TIF_SYSCALL_AUDIT |	\
 	 _TIF_SECCOMP | _TIF_SINGLESTEP | _TIF_SYSCALL_TRACEPOINT |	\
-	 _TIF_NOHZ)
+	 _TIF_NOHZ | _TIF_ENTER_WAIT)
 
 /* work to do in syscall_trace_leave() */
 #define _TIF_WORK_SYSCALL_EXIT	\
diff --git a/arch/x86/kernel/entry_64.S b/arch/x86/kernel/entry_64.S
index db13655c3a2a..735566b35903 100644
--- a/arch/x86/kernel/entry_64.S
+++ b/arch/x86/kernel/entry_64.S
@@ -387,6 +387,8 @@ GLOBAL(system_call_after_swapgs)
 
 	/* Do syscall tracing */
 tracesys:
+	andl $_TIF_ENTER_WAIT,TI_flags+THREAD_INFO(%rsp,RIP-ARGOFFSET)
+	jnz tracesys;
 	leaq -REST_SKIP(%rsp), %rdi
 	movq $AUDIT_ARCH_X86_64, %rsi
 	call syscall_trace_enter_phase1

> > > As a last resort, if there are still any tasks which are sleeping on a
> > > to-be-patched function, the user can send them SIGSTOP and SIGCONT to
> > > force them to be patched.
> > 
> > You typically cannot SIGSTOP/SIGCONT kernel threads. Also
> > TASK_UNINTERRUPTIBLE sleeps are unaffected by signals.
> > 
> > Bit pesky that.. needs pondering.

I still absolutely hate you need to disturb userspace like that. Signals
are quite visible and perturb userspace state.

Also, you cannot SIGCONT a task that was SIGSTOP'ed by userspace for
what they thought was a good reason. You'd wreck their state.

> But now I'm thinking that kthreads will almost never be a problem.  Most
> kthreads are basically this:

You guys are way too optimistic; maybe its because I've worked on
realtime stuff too much, but I'm always looking at worst cases. If you
can't handle those, I feel you might as well not bother :-)

> Patching thread_fn wouldn't be possible unless we killed the thread.

It is, see kthread_park().
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/