lists.openwall.net | lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC | |
Open Source and information security mailing list archives
| ||
|
Date: Thu, 9 Apr 2020 11:00:16 -0700 From: Andy Lutomirski <luto@...capital.net> To: Alex Belits <abelits@...vell.com> Cc: "frederic@...nel.org" <frederic@...nel.org>, "rostedt@...dmis.org" <rostedt@...dmis.org>, Prasun Kapoor <pkapoor@...vell.com>, "mingo@...nel.org" <mingo@...nel.org>, "davem@...emloft.net" <davem@...emloft.net>, "linux-api@...r.kernel.org" <linux-api@...r.kernel.org>, "peterz@...radead.org" <peterz@...radead.org>, "linux-arch@...r.kernel.org" <linux-arch@...r.kernel.org>, "catalin.marinas@....com" <catalin.marinas@....com>, "tglx@...utronix.de" <tglx@...utronix.de>, "will@...nel.org" <will@...nel.org>, "linux-arm-kernel@...ts.infradead.org" <linux-arm-kernel@...ts.infradead.org>, "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>, "netdev@...r.kernel.org" <netdev@...r.kernel.org> Subject: Re: [PATCH v3 04/13] task_isolation: userspace hard isolation from kernel > On Apr 9, 2020, at 8:21 AM, Alex Belits <abelits@...vell.com> wrote: > > The existing nohz_full mode is designed as a "soft" isolation mode > that makes tradeoffs to minimize userspace interruptions while > still attempting to avoid overheads in the kernel entry/exit path, > to provide 100% kernel semantics, etc. > > However, some applications require a "hard" commitment from the > kernel to avoid interruptions, in particular userspace device driver > style applications, such as high-speed networking code. > > This change introduces a framework to allow applications > to elect to have the "hard" semantics as needed, specifying > prctl(PR_TASK_ISOLATION, PR_TASK_ISOLATION_ENABLE) to do so. > > The kernel must be built with the new TASK_ISOLATION Kconfig flag > to enable this mode, and the kernel booted with an appropriate > "isolcpus=nohz,domain,CPULIST" boot argument to enable > nohz_full and isolcpus. The "task_isolation" state is then indicated > by setting a new task struct field, task_isolation_flag, to the > value passed by prctl(), and also setting a TIF_TASK_ISOLATION > bit in the thread_info flags. When the kernel is returning to > userspace from the prctl() call and sees TIF_TASK_ISOLATION set, > it calls the new task_isolation_start() routine to arrange for > the task to avoid being interrupted in the future. > > With interrupts disabled, task_isolation_start() ensures that kernel > subsystems that might cause a future interrupt are quiesced. If it > doesn't succeed, it adjusts the syscall return value to indicate that > fact, and userspace can retry as desired. In addition to stopping > the scheduler tick, the code takes any actions that might avoid > a future interrupt to the core, such as a worker thread being > scheduled that could be quiesced now (e.g. the vmstat worker) > or a future IPI to the core to clean up some state that could be > cleaned up now (e.g. the mm lru per-cpu cache). > > Once the task has returned to userspace after issuing the prctl(), > if it enters the kernel again via system call, page fault, or any > other exception or irq, the kernel will kill it with SIGKILL. I could easily imagine myself using task isolation, but not with the SIGKILL semantics. SIGKILL causes data loss. Please at least let users choose what signal to send.
Powered by blists - more mailing lists