linux-kernel - Re: [PATCH v7 07/11] arch/x86: enable task isolation functionality

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives

Hash Suite: Windows password security audit tool. GUI, reports in PDF.

[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]

Message-ID: <CALCETrUaUOLpFqFbPcPRh=nf57XfpXU=QfMvD9YvQE8rSZikGg@mail.gmail.com>
Date:	Tue, 29 Sep 2015 10:57:41 -0700
From:	Andy Lutomirski <luto@...capital.net>
To:	Chris Metcalf <cmetcalf@...hip.com>
Cc:	Gilad Ben Yossef <giladb@...hip.com>,
	Steven Rostedt <rostedt@...dmis.org>,
	Ingo Molnar <mingo@...nel.org>,
	Peter Zijlstra <peterz@...radead.org>,
	Andrew Morton <akpm@...ux-foundation.org>,
	Rik van Riel <riel@...hat.com>, Tejun Heo <tj@...nel.org>,
	Frederic Weisbecker <fweisbec@...il.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
	Christoph Lameter <cl@...ux.com>,
	Viresh Kumar <viresh.kumar@...aro.org>,
	Catalin Marinas <catalin.marinas@....com>,
	Will Deacon <will.deacon@....com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>
Subject: Re: [PATCH v7 07/11] arch/x86: enable task isolation functionality

On Tue, Sep 29, 2015 at 10:42 AM, Chris Metcalf <cmetcalf@...hip.com> wrote:
> On 09/28/2015 06:43 PM, Andy Lutomirski wrote:
>>
>> Why are we treating alarms as something that should defer entry to
>> userspace?  I think it would be entirely reasonable to set an alarm
>> for ten minutes, ask for isolation, and then think hard for ten
>> minutes.
>>
>> A bigger issue would be if there's an RT task that asks for isolation
>> and a bunch of other stuff (most notably KVM hosts) running with
>> uncontrained affinity at full load.  If task_isolation_enter always
>> sleeps, then your KVM host will get scheduled, and it'll ask for a
>> user return notifier on the way out, and you might just loop forever.
>> Can this happen?
>
>
> task_isolation_enter() doesn't sleep - it spins.  This is intentional,
> because the point is that there should be nothing else that
> could be scheduled on that cpu.  We're just waiting for any
> pending kernel management timer interrupts to fire.
>
> In any case, you normally wouldn't have a KVM host running
> on an isolcpus, nohz_full cpu, unless it was the only thing
> running there, I imagine (just as would be true for any other
> host process).

The problem is that AFAICT systemd (and possibly other things) makes
is rather painful to guarantee that nothing low-priority (systemd
itself) would schedule on an arbitrary CPU.  I would hope that merely
setting affinity and RT would be enough to get isolation without
playing fancy cgroup games.  Maybe not.

>
>> ISTM something's suboptimal with the inner workings of all this if
>> task_isolation_enter needs to sleep to wait for an event that isn't
>> scheduled for the immediate future (e.g. already queued up as an
>> interrupt).
>
>
> Scheduling a timer for 10 minutes away is typically done by
> scheduling timers for the max timer granularity (which could
> be just a few seconds) and then waking up a couple of hundred
> times between now and now+10 minutes.  Doing this breaks
> the task isolation guarantee, so we can't return to userspace
> while something like that is pending.  You'd have to do it
> by polling in userspace to avoid the unexpected interrupts.
>

Really?  That sucks.  Hopefully we can fix it.

> I suppose if your hardware supported it, you could imagine
> a mode where userspace can request an alarm a specific
> amount of time in the future, and the task isolation code
> would then ignore an alarm that was going off at that
> specific time.  But I'm not sure what hardware does support
> that (I know tile uses the "few seconds and re-arm" model),
> and it seems like a pretty corner use-case.  We could
> certainly investigate adding such support later, but I don't see
> it as part of the core value proposition for task isolation.
>

Intel chips Sandy Bridge and newer certainly support this. Other chips
might support it as well.  Whether the kernel is able to program the
TSC deadline timer like that is a different question.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/