linux-kernel - Re: [PATCH v13 00/12] support "task

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <2a88da0c-b1a9-aec3-f5a6-d524e69e4731@mellanox.com>
Date:	Thu, 14 Jul 2016 17:22:17 -0400
From:	Chris Metcalf <cmetcalf@...lanox.com>
To:	Andy Lutomirski <luto@...capital.net>
CC:	Gilad Ben Yossef <giladb@...lanox.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	"Rik van Riel" <riel@...hat.com>, Tejun Heo <tj@...nel.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Christoph Lameter <cl@...ux.com>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will.deacon@....com>,
	Daniel Lezcano <daniel.lezcano@...aro.org>,
	"linux-doc@...r.kernel.org" <linux-doc@...r.kernel.org>,
	Linux API <linux-api@...r.kernel.org>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH v13 00/12] support "task_isolation" mode

On 7/14/2016 5:03 PM, Andy Lutomirski wrote:
> On Thu, Jul 14, 2016 at 1:48 PM, Chris Metcalf <cmetcalf@...lanox.com> wrote:
>> Here is a respin of the task-isolation patch set.  This primarily
>> reflects feedback from Frederic and Peter Z.
> I still think this is the wrong approach, at least at this point.  The
> first step should be to instrument things if necessary and fix the
> obvious cases where the kernel gets entered asynchronously.

Note, however, that the task_isolation_debug mode is a very convenient
way of discovering what is going on when things do go wrong for task isolation.

> Only once
> there's a credible reason to believe it can work well should any form
> of strictness be applied.

I'm not sure what criteria you need for this, though.  Certainly we've been
shipping our version of task isolation to customers since 2008, and there
are quite a few customer applications in production that are working well.
I'd argue that's a credible reason.

> As an example, enough vmalloc/vfree activity will eventually cause
> flush_tlb_kernel_range to be called and *boom*, there goes your shiny
> production dataplane application.

Well, that's actually a refinement that I did not inflict on this patch series.

In our code base, we have a hook for kernel TLB flushes that defers such
flushes for cores that are running in userspace, because, after all, they
don't yet care about such flushes.  Instead, we atomically set a flag that
is checked on entry to the kernel, and that causes the TLB flush to occur
at that point.

> On very brief inspection, __kmem_cache_shutdown will be a problem on
> some workloads as well.

That looks like it should be amenable to a version of the same fix I pushed
upstream in 5fbc461636c32efd ("mm: make lru_add_drain_all() selective").
You would basically check which cores have non-empty caches, and only
interrupt those cores.  For extra credit, you empty the cache on your local cpu
when you are entering task isolation mode.  Now you don't get interrupted.

To be fair, I've never seen this particular path cause an interruption.  And I
think this speaks to the fact that there really can't be a black and white
decision about when you have removed enough possible interrupt paths.
It really does depend on what else is running on your machine in addition
to the task isolation code, and that will vary from application to application.
And, as the kernel evolves, new ways of interrupting task isolation cores
will get added and need to be dealt with.  There really isn't a perfect time
you can wait for and then declare that all the asynchronous entry cases
have been dealt with and now things are safe for task isolation.

-- 
Chris Metcalf, Mellanox Technologies
http://www.mellanox.com