linux-kernel - Re: [PATCH v15 04/13] task_isolation: add initial support

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <aa0cf4eb-7da5-dfe1-45b2-f2a80969b706@mellanox.com>
Date:   Fri, 30 Sep 2016 12:59:33 -0400
From:   Chris Metcalf <cmetcalf@...lanox.com>
To:     Andy Lutomirski <luto@...capital.net>
CC:     Thomas Gleixner <tglx@...utronix.de>,
        Christoph Lameter <cl@...ux.com>,
        Michal Hocko <mhocko@...e.com>,
        Gilad Ben Yossef <giladb@...lanox.com>,
        Andrew Morton <akpm@...ux-foundation.org>,
        Linux API <linux-api@...r.kernel.org>,
        Viresh Kumar <viresh.kumar@...aro.org>,
        "Ingo Molnar" <mingo@...nel.org>,
        Steven Rostedt <rostedt@...dmis.org>,
        Tejun Heo <tj@...nel.org>, Will Deacon <will.deacon@....com>,
        Rik van Riel <riel@...hat.com>,
        Frederic Weisbecker <fweisbec@...il.com>,
        "Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
        "linux-mm@...ck.org" <linux-mm@...ck.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
        Catalin Marinas <catalin.marinas@....com>,
        Peter Zijlstra <peterz@...radead.org>
Subject: Re: [PATCH v15 04/13] task_isolation: add initial support

On 8/30/2016 2:43 PM, Andy Lutomirski wrote:
> On Aug 30, 2016 10:02 AM, "Chris Metcalf" <cmetcalf@...lanox.com> wrote:
>> We really want to run task isolation last, so we can guarantee that
>> all the isolation prerequisites are met (dynticks stopped, per-cpu lru
>> cache empty, etc).  But achieving that state can require enabling
>> interrupts - most obviously if we have to schedule, e.g. for vmstat
>> clearing or whatnot (see the cond_resched in refresh_cpu_vm_stats), or
>> just while waiting for that last dyntick interrupt to occur.  I'm also
>> not sure that even something as simple as draining the per-cpu lru
>> cache can be done holding interrupts disabled throughout - certainly
>> there's a !SMP code path there that just re-enables interrupts
>> unconditionally, which gives me pause.
>>
>> At any rate at that point you need to retest for signals, resched,
>> etc, all as usual, and then you need to recheck the task isolation
>> prerequisites once more.
>>
>> I may be missing something here, but it's really not obvious to me
>> that there's a way to do this without having task isolation integrated
>> into the usual return-to-userspace loop.
> What if we did it the other way around: set a percpu flag saying
> "going quiescent; disallow new deferred work", then finish all
> existing work and return to userspace.  Then, on the next entry, clear
> that flag.  With the flag set, vmstat would just flush anything that
> it accumulates immediately, nothing would be added to the LRU list,
> etc.

Thinking about this some more, I was struck by an even simpler way
to approach this.  What if we just said that on task isolation cores, no
kernel subsystem should do something that would require a future
interruption?  So vmstat would just always sync immediately on task
isolation cores, the mm subsystem wouldn't use per-cpu LRU stuff on
task isolation cores, etc.  That way we don't have to worry about the
status of those things as we are returning to userspace for a task
isolation process, since it's just always kept "pristine".

The task-isolation setting per-core is not user-customizable, and the
task-stealing scheduler doesn't even run there, so it's not like any
processes will land there and be in a position to complain about the
performance overhead of having no deferred work being created...

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com