[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrUaUOLpFqFbPcPRh=nf57XfpXU=QfMvD9YvQE8rSZikGg@mail.gmail.com>
Date: Tue, 29 Sep 2015 10:57:41 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Chris Metcalf <cmetcalf@...hip.com>
Cc: Gilad Ben Yossef <giladb@...hip.com>,
Steven Rostedt <rostedt@...dmis.org>,
Ingo Molnar <mingo@...nel.org>,
Peter Zijlstra <peterz@...radead.org>,
Andrew Morton <akpm@...ux-foundation.org>,
Rik van Riel <riel@...hat.com>, Tejun Heo <tj@...nel.org>,
Frederic Weisbecker <fweisbec@...il.com>,
Thomas Gleixner <tglx@...utronix.de>,
"Paul E. McKenney" <paulmck@...ux.vnet.ibm.com>,
Christoph Lameter <cl@...ux.com>,
Viresh Kumar <viresh.kumar@...aro.org>,
Catalin Marinas <catalin.marinas@....com>,
Will Deacon <will.deacon@....com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"H. Peter Anvin" <hpa@...or.com>, X86 ML <x86@...nel.org>
Subject: Re: [PATCH v7 07/11] arch/x86: enable task isolation functionality
On Tue, Sep 29, 2015 at 10:42 AM, Chris Metcalf <cmetcalf@...hip.com> wrote:
> On 09/28/2015 06:43 PM, Andy Lutomirski wrote:
>>
>> Why are we treating alarms as something that should defer entry to
>> userspace? I think it would be entirely reasonable to set an alarm
>> for ten minutes, ask for isolation, and then think hard for ten
>> minutes.
>>
>> A bigger issue would be if there's an RT task that asks for isolation
>> and a bunch of other stuff (most notably KVM hosts) running with
>> uncontrained affinity at full load. If task_isolation_enter always
>> sleeps, then your KVM host will get scheduled, and it'll ask for a
>> user return notifier on the way out, and you might just loop forever.
>> Can this happen?
>
>
> task_isolation_enter() doesn't sleep - it spins. This is intentional,
> because the point is that there should be nothing else that
> could be scheduled on that cpu. We're just waiting for any
> pending kernel management timer interrupts to fire.
>
> In any case, you normally wouldn't have a KVM host running
> on an isolcpus, nohz_full cpu, unless it was the only thing
> running there, I imagine (just as would be true for any other
> host process).
The problem is that AFAICT systemd (and possibly other things) makes
is rather painful to guarantee that nothing low-priority (systemd
itself) would schedule on an arbitrary CPU. I would hope that merely
setting affinity and RT would be enough to get isolation without
playing fancy cgroup games. Maybe not.
>
>> ISTM something's suboptimal with the inner workings of all this if
>> task_isolation_enter needs to sleep to wait for an event that isn't
>> scheduled for the immediate future (e.g. already queued up as an
>> interrupt).
>
>
> Scheduling a timer for 10 minutes away is typically done by
> scheduling timers for the max timer granularity (which could
> be just a few seconds) and then waking up a couple of hundred
> times between now and now+10 minutes. Doing this breaks
> the task isolation guarantee, so we can't return to userspace
> while something like that is pending. You'd have to do it
> by polling in userspace to avoid the unexpected interrupts.
>
Really? That sucks. Hopefully we can fix it.
> I suppose if your hardware supported it, you could imagine
> a mode where userspace can request an alarm a specific
> amount of time in the future, and the task isolation code
> would then ignore an alarm that was going off at that
> specific time. But I'm not sure what hardware does support
> that (I know tile uses the "few seconds and re-arm" model),
> and it seems like a pretty corner use-case. We could
> certainly investigate adding such support later, but I don't see
> it as part of the core value proposition for task isolation.
>
Intel chips Sandy Bridge and newer certainly support this. Other chips
might support it as well. Whether the kernel is able to program the
TSC deadline timer like that is a different question.
--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists