lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150812160020.GG21542@lerouge>
Date:	Wed, 12 Aug 2015 18:00:21 +0200
From:	Frederic Weisbecker <fweisbec@...il.com>
To:	Chris Metcalf <cmetcalf@...hip.com>
Cc:	Gilad Ben Yossef <giladb@...hip.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>, Tejun Heo <tj@...nel.org>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Christoph Lameter <cl@...ux.com>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will.deacon@....com>, linux-doc@...r.kernel.org,
	linux-api@...r.kernel.org, linux-kernel@...r.kernel.org
Subject: Re: [PATCH v5 2/6] cpu_isolated: add initial support

On Tue, Jul 28, 2015 at 03:49:36PM -0400, Chris Metcalf wrote:
> The existing nohz_full mode is designed as a "soft" isolation mode
> that makes tradeoffs to minimize userspace interruptions while
> still attempting to avoid overheads in the kernel entry/exit path,
> to provide 100% kernel semantics, etc.
> 
> However, some applications require a "hard" commitment from the
> kernel to avoid interruptions, in particular userspace device
> driver style applications, such as high-speed networking code.
> 
> This change introduces a framework to allow applications
> to elect to have the "hard" semantics as needed, specifying
> prctl(PR_SET_CPU_ISOLATED, PR_CPU_ISOLATED_ENABLE) to do so.
> Subsequent commits will add additional flags and additional
> semantics.

We are doing this at the process level but the isolation works on
the CPU scope... Now I wonder if prctl is the right interface.

That said the user is rather interested in isolating a task. The CPU
being the backend eventually.

For example if the task is migrated by accident, we want it to be
warned about that. And if the isolation is done on the CPU level
instead of the task level, this won't happen.

I'm also afraid that the naming clashes with cpu_isolated_map,
although it could be a subset of it.

So probably in this case we should consider talking about task rather
than CPU isolation and change naming accordingly (sorry, I know I
suggested cpu_isolation.c, I guess I had to see the result to realize).

We must sort that out first. Either we consider isolation on the task
level (and thus the underlying CPU by backend effect) and we use prctl().
Or we do this on the CPU level and we use a specific syscall or sysfs
which takes effect on any task in the relevant isolated CPUs.

What do you think?

It would be nice to hear others opinions as well.

> The kernel must be built with the new CPU_ISOLATED Kconfig flag
> to enable this mode, and the kernel booted with an appropriate
> nohz_full=CPULIST boot argument.  The "cpu_isolated" state is then
> indicated by setting a new task struct field, cpu_isolated_flags,
> to the value passed by prctl().  When the _ENABLE bit is set for a
> task, and it is returning to userspace on a nohz_full core, it calls
> the new cpu_isolated_enter() routine to take additional actions
> to help the task avoid being interrupted in the future.
> 
> Initially, there are only three actions taken.  First, the
> task calls lru_add_drain() to prevent being interrupted by a
> subsequent lru_add_drain_all() call on another core.  Then, it calls
> quiet_vmstat() to quieten the vmstat worker to avoid a follow-on
> interrupt.  Finally, the code checks for pending timer interrupts
> and quiesces until they are no longer pending.  As a result, sys
> calls (and page faults, etc.) can be inordinately slow.  However,
> this quiescing guarantees that no unexpected interrupts will occur,
> even if the application intentionally calls into the kernel.
> 
> Signed-off-by: Chris Metcalf <cmetcalf@...hip.com>
> ---
>  arch/tile/kernel/process.c   |  9 ++++++
>  include/linux/cpu_isolated.h | 24 +++++++++++++++
>  include/linux/sched.h        |  3 ++
>  include/uapi/linux/prctl.h   |  5 ++++
>  kernel/context_tracking.c    |  3 ++
>  kernel/sys.c                 |  8 +++++
>  kernel/time/Kconfig          | 20 +++++++++++++
>  kernel/time/Makefile         |  1 +
>  kernel/time/cpu_isolated.c   | 71 ++++++++++++++++++++++++++++++++++++++++++++

It's not about time :-)

The timer is only a part of the isolation.

Moreover "isolatED" is a state. The filename should reflect the process. "isolatION" would
better fit.

kernel/task_isolation.c maybe or just kernel/isolation.c

I think I prefer the latter because I'm not only interested in that task
hard isolation feature, I would like to also drive all the general isolation
operations from there (workqueue affinity, rcu nocb, ...).

>  9 files changed, 144 insertions(+)
>  create mode 100644 include/linux/cpu_isolated.h
>  create mode 100644 kernel/time/cpu_isolated.c
> 
> diff --git a/arch/tile/kernel/process.c b/arch/tile/kernel/process.c
> index e036c0aa9792..7db6f8386417 100644
> --- a/arch/tile/kernel/process.c
> +++ b/arch/tile/kernel/process.c
> @@ -70,6 +70,15 @@ void arch_cpu_idle(void)
>  	_cpu_idle();
>  }
>  
> +#ifdef CONFIG_CPU_ISOLATED
> +void cpu_isolated_wait(void)
> +{
> +	set_current_state(TASK_INTERRUPTIBLE);
> +	_cpu_idle();
> +	set_current_state(TASK_RUNNING);
> +}

I'm still uncomfortable with that. A wake up model could work?

> +#endif
> +
>  /*
>   * Release a thread_info structure
>   */
> diff --git a/include/linux/cpu_isolated.h b/include/linux/cpu_isolated.h
> new file mode 100644
> index 000000000000..a3d17360f7ae
> --- /dev/null
> +++ b/include/linux/cpu_isolated.h
> @@ -0,0 +1,24 @@
> +/*
> + * CPU isolation related global functions
> + */
> +#ifndef _LINUX_CPU_ISOLATED_H
> +#define _LINUX_CPU_ISOLATED_H
> +
> +#include <linux/tick.h>
> +#include <linux/prctl.h>
> +
> +#ifdef CONFIG_CPU_ISOLATED
> +static inline bool is_cpu_isolated(void)
> +{
> +	return tick_nohz_full_cpu(smp_processor_id()) &&
> +		(current->cpu_isolated_flags & PR_CPU_ISOLATED_ENABLE);
> +}
> +
> +extern void cpu_isolated_enter(void);
> +extern void cpu_isolated_wait(void);
> +#else
> +static inline bool is_cpu_isolated(void) { return false; }
> +static inline void cpu_isolated_enter(void) { }
> +#endif

And all the naming should be about task as well, if we take that task direction.

> +
> +#endif
> diff --git a/include/linux/sched.h b/include/linux/sched.h
> index 04b5ada460b4..0bb248385d88 100644
> --- a/include/linux/sched.h
> +++ b/include/linux/sched.h
> @@ -1776,6 +1776,9 @@ struct task_struct {
>  	unsigned long	task_state_change;
>  #endif
>  	int pagefault_disabled;
> +#ifdef CONFIG_CPU_ISOLATED
> +	unsigned int	cpu_isolated_flags;
> +#endif

Can't we add a new flag to tsk->flags? There seem to be some values remaining.

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ