[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-Id: <915489BC-B2C9-4D47-A205-FC597FC68B98@amacapital.net>
Date: Thu, 9 Apr 2020 11:00:16 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Alex Belits <abelits@...vell.com>
Cc: "frederic@...nel.org" <frederic@...nel.org>,
"rostedt@...dmis.org" <rostedt@...dmis.org>,
Prasun Kapoor <pkapoor@...vell.com>,
"mingo@...nel.org" <mingo@...nel.org>,
"davem@...emloft.net" <davem@...emloft.net>,
"linux-api@...r.kernel.org" <linux-api@...r.kernel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>,
"catalin.marinas@....com" <catalin.marinas@....com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"will@...nel.org" <will@...nel.org>,
"linux-arm-kernel@...ts.infradead.org"
<linux-arm-kernel@...ts.infradead.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"netdev@...r.kernel.org" <netdev@...r.kernel.org>
Subject: Re: [PATCH v3 04/13] task_isolation: userspace hard isolation from kernel
> On Apr 9, 2020, at 8:21 AM, Alex Belits <abelits@...vell.com> wrote:
>
> The existing nohz_full mode is designed as a "soft" isolation mode
> that makes tradeoffs to minimize userspace interruptions while
> still attempting to avoid overheads in the kernel entry/exit path,
> to provide 100% kernel semantics, etc.
>
> However, some applications require a "hard" commitment from the
> kernel to avoid interruptions, in particular userspace device driver
> style applications, such as high-speed networking code.
>
> This change introduces a framework to allow applications
> to elect to have the "hard" semantics as needed, specifying
> prctl(PR_TASK_ISOLATION, PR_TASK_ISOLATION_ENABLE) to do so.
>
> The kernel must be built with the new TASK_ISOLATION Kconfig flag
> to enable this mode, and the kernel booted with an appropriate
> "isolcpus=nohz,domain,CPULIST" boot argument to enable
> nohz_full and isolcpus. The "task_isolation" state is then indicated
> by setting a new task struct field, task_isolation_flag, to the
> value passed by prctl(), and also setting a TIF_TASK_ISOLATION
> bit in the thread_info flags. When the kernel is returning to
> userspace from the prctl() call and sees TIF_TASK_ISOLATION set,
> it calls the new task_isolation_start() routine to arrange for
> the task to avoid being interrupted in the future.
>
> With interrupts disabled, task_isolation_start() ensures that kernel
> subsystems that might cause a future interrupt are quiesced. If it
> doesn't succeed, it adjusts the syscall return value to indicate that
> fact, and userspace can retry as desired. In addition to stopping
> the scheduler tick, the code takes any actions that might avoid
> a future interrupt to the core, such as a worker thread being
> scheduled that could be quiesced now (e.g. the vmstat worker)
> or a future IPI to the core to clean up some state that could be
> cleaned up now (e.g. the mm lru per-cpu cache).
>
> Once the task has returned to userspace after issuing the prctl(),
> if it enters the kernel again via system call, page fault, or any
> other exception or irq, the kernel will kill it with SIGKILL.
I could easily imagine myself using task isolation, but not with the SIGKILL semantics. SIGKILL causes data loss. Please at least let users choose what signal to send.
Powered by blists - more mailing lists