[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <56941B86.9090009@ezchip.com>
Date: Mon, 11 Jan 2016 16:15:50 -0500
From: Chris Metcalf <cmetcalf@...hip.com>
To: Gilad Ben Yossef <giladb@...hip.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...hat.com>, Tejun Heo <tj@...nel.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Christoph Lameter <cl@...ux.com>,
Viresh Kumar <viresh.kumar@...aro.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will.deacon@....com>,
Andy Lutomirski <luto@...capital.net>,
Daniel Lezcano <daniel.lezcano@...aro.org>,
<linux-doc@...r.kernel.org>, <linux-api@...r.kernel.org>,
<linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v9 00/13] support "task_isolation" mode for nohz_full
Ping! There has been no substantive feedback to this version of
the patch in the week since I posted it, which optimistically suggests
to me that people may be satisfied with it. If that's true, Frederic,
I assume this would be pulled into your tree?
I have slightly updated the v9 patch series since this posting:
- Incorporated a fix to initialize cpu_isolation_mask early if no
cpu_isolation= boot argument was given, to avoid crashing on
CPUMASK_OFFSTACK platforms.
- Incorporated Mark Rutland's changes to convert arm64
assembly to C code instead of using my own version.
The updated patch series is available in the branch at
git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile.git
dataplane
I will post a v10 with those couple of small changes if I don't hear
any other feedback, or of course feel free to pull from the git repo.
On 01/04/2016 02:34 PM, Chris Metcalf wrote:
> It has been a couple of months since the v8 version of this patch,
> since various other priorities came up at work. Since it's been
> a while I will try to summarize where I think we got to on the
> various issues that were raised with v8.
>
> 1. Andy Lutomirski raised the issue of whether it really made sense to
> only attempt to set up the conditions for task isolation, ask the kernel
> nicely for it, and then wait until it happened. He wondered if a
> SCHED_ISOLATED class might be a helpful abstraction. Steven Rostedt
> also suggested having an interface that would force everything else
> off a core to enable SCHED_ISOLATED to succeed. Frederick added
> some concerns about enforcing the test that the process was in a
> good state to enter task isolation.
>
> I tried to address the different design philosphies for what I called
> the original "polite" mode and the reviewers' suggestions for an
> "aggressive" mode in this email:
>
> https://lkml.org/lkml/2015/10/26/625
>
> As I said there, on balance I think the "polite" option is still
> better. Obviously folks are welcome to disagree and I'm happy to
> continue that conversation (or perhaps I convinced everyone).
>
> 2. Andy didn't like the idea of having a "STRICT" mode which
> delivered a signal to a process for violating the contract that it
> will promise to stay out of the kernel. Gilad Ben Yossef argued that
> it made sense to have a way for the kernel to enforce the requested
> correctness guarantee of never being interrupted. Andy pointed out
> that we should then really deliver such a signal when the kernel
> delivers an asynchronous interrupt to the core as well. In particular
> this is a concern for the application-error case of a process that
> calls unmap() on one core while a thread on another core is running
> STRICT, and thus gets an unexpected TLB flush.
>
> This patch series addresses that concern by including support for
> IRQs, IPIs, and similar asynchronous interrupts to also send the
> STRICT signal to the process. We don't try to send the signal if
> we are in an NMI, and instead just force a console backtrace like
> you would get in task_isolation_debug mode.
>
> 3. Frederick nack'ed my patch for a boot flag to disable the 1Hz
> periodic scheduler tick.
>
> I'm still hoping he's open to changing his mind about that, but in
> this patch series I have removed that boot flag.
>
> Various other changes have been introduced since v8:
>
> https://lkml.kernel.org/r/1445373372-6567-1-git-send-email-cmetcalf@ezchip.com
>
> - Rebased to Linux 4.4-rc5.
>
> - Since nohz_full and isolnodes have been separated back out again in
> 4.4, I introduced a new task_isolation=MASK boot argument that sets
> both of them. The task isolation support now requires that this
> boot flag have been used; it intentionally doesn't work if you've
> just enabled nohz_full and isolcpus separately. I could be
> convinced that doing it the other way around makes sense, though.
>
> - I folded the two STRICT mode patches together since there didn't
> seem to be much value in having the second patch that just enabled
> having a settable signal. I also refactored the various routines
> that report on interrupts/exceptions/etc to make it easier to hook
> in from the case where we are interrupted asynchronously.
>
> - For the debug support, I moved most of the functionality into
> kernel/isolation.c and out of kernel/sched/core.c, leaving only a
> small hook to handle mapping a remote cpu to a task struct safely.
> In addition to implementing Andy's suggestion of signalling a task
> when it is interrupted asynchronously, I also added a ratelimit
> hook so we won't spam the console if (for example) a timer interrupt
> runs amok - particularly since when this happens without ratelimit,
> it can end up self-perpetuating the timer interrupt.
>
> - I added a task_isolation_debug_cpumask() helper function to check
> all the cpus in a mask to see if they are being interrupted
> inappropriately.
>
> - I made the check for irq_enter() robust to architectures that
> have already entered user mode context_tracking before calling
> irq_enter() by testing user_mode(get_irq_regs()) instead of
> context_tracking_in_user(), and split out the code to a separate
> inlined function so I could comment it better.
>
> - For arm64, I added a task_isolation_debug_cpumask() hook for
> smp_cross_call(), which I had missed in the earlier versions.
>
> - I generalized the fix for tile to set up a clockevents hook for
> set_state_oneshot_stopped() to also apply to the arm_arch_timer,
> which I realized was showing the same problem. For both cases,
> this seems to be what Viresh had in mind with commit 8fff52fd509345
> ("clockevents: Introduce CLOCK_EVT_STATE_ONESHOT_STOPPED state").
>
> - For tile, I adopted the arm model of doing user_exit() calls in the
> early assembly code (a new patch in this series). I also added a
> missing task_isolation_debug hook for tile's IPI and remote cache
> flush code.
>
> Chris Metcalf (12):
> vmstat: add vmstat_idle function
> lru_add_drain_all: factor out lru_add_drain_needed
> task_isolation: add initial support
> task_isolation: support PR_TASK_ISOLATION_STRICT mode
> task_isolation: add debug boot flag
> arch/x86: enable task isolation functionality
> arch/arm64: adopt prepare_exit_to_usermode() model from x86
> arch/arm64: enable task isolation functionality
> arch/tile: adopt prepare_exit_to_usermode() model from x86
> arch/tile: move user_exit() to early kernel entry sequence
> arch/tile: enable task isolation functionality
> arm, tile: turn off timer tick for oneshot_stopped state
>
> Christoph Lameter (1):
> vmstat: provide a function to quiet down the diff processing
>
> Documentation/kernel-parameters.txt | 16 +++
> arch/arm64/include/asm/thread_info.h | 18 ++-
> arch/arm64/kernel/entry.S | 6 +-
> arch/arm64/kernel/ptrace.c | 12 +-
> arch/arm64/kernel/signal.c | 35 ++++--
> arch/arm64/kernel/smp.c | 2 +
> arch/arm64/mm/fault.c | 4 +
> arch/tile/include/asm/processor.h | 2 +-
> arch/tile/include/asm/thread_info.h | 8 +-
> arch/tile/kernel/intvec_32.S | 51 +++-----
> arch/tile/kernel/intvec_64.S | 54 +++------
> arch/tile/kernel/process.c | 83 +++++++------
> arch/tile/kernel/ptrace.c | 19 +--
> arch/tile/kernel/single_step.c | 8 +-
> arch/tile/kernel/smp.c | 26 ++--
> arch/tile/kernel/time.c | 1 +
> arch/tile/kernel/traps.c | 13 +-
> arch/tile/kernel/unaligned.c | 16 ++-
> arch/tile/mm/fault.c | 6 +-
> arch/tile/mm/homecache.c | 2 +
> arch/x86/entry/common.c | 10 +-
> arch/x86/kernel/traps.c | 2 +
> arch/x86/mm/fault.c | 2 +
> drivers/clocksource/arm_arch_timer.c | 2 +
> include/linux/isolation.h | 80 +++++++++++++
> include/linux/sched.h | 3 +
> include/linux/swap.h | 1 +
> include/linux/vmstat.h | 4 +
> include/uapi/linux/prctl.h | 8 ++
> init/Kconfig | 20 ++++
> kernel/Makefile | 1 +
> kernel/irq_work.c | 5 +-
> kernel/isolation.c | 225 +++++++++++++++++++++++++++++++++++
> kernel/sched/core.c | 18 +++
> kernel/signal.c | 5 +
> kernel/smp.c | 6 +-
> kernel/softirq.c | 33 +++++
> kernel/sys.c | 9 ++
> mm/swap.c | 13 +-
> mm/vmstat.c | 24 ++++
> 40 files changed, 665 insertions(+), 188 deletions(-)
> create mode 100644 include/linux/isolation.h
> create mode 100644 kernel/isolation.c
>
--
Chris Metcalf, EZChip Semiconductor
http://www.ezchip.com
Powered by blists - more mailing lists