[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <CALCETrVMw4Nd-QZER9qzOzRte5s48WrUaM8ZZzkY_g3B6s+5Ow@mail.gmail.com>
Date: Fri, 16 Sep 2016 11:19:38 -0700
From: Andy Lutomirski <luto@...capital.net>
To: Peter Zijlstra <peterz@...radead.org>
Cc: Ingo Molnar <mingo@...hat.com>,
Mike Galbraith <umgwanakikbuti@...il.com>, kernel-team@...com,
Andrew Morton <akpm@...ux-foundation.org>,
"open list:CONTROL GROUP (CGROUP)" <cgroups@...r.kernel.org>,
Paul Turner <pjt@...gle.com>, Li Zefan <lizefan@...wei.com>,
Linux API <linux-api@...r.kernel.org>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
Tejun Heo <tj@...nel.org>,
Johannes Weiner <hannes@...xchg.org>,
Linus Torvalds <torvalds@...ux-foundation.org>
Subject: Re: [Documentation] State of CPU controller in cgroup v2
On Fri, Sep 16, 2016 at 9:50 AM, Peter Zijlstra <peterz@...radead.org> wrote:
> On Fri, Sep 16, 2016 at 09:29:06AM -0700, Andy Lutomirski wrote:
>
>> > SCHED_DEADLINE, its a 'Global'-EDF like scheduler that doesn't support
>> > CPU affinities (because that doesn't make sense). The only way to
>> > restrict it is to partition.
>> >
>> > 'Global' because you can partition it. If you reduce your system to
>> > single CPU partitions you'll reduce to P-EDF.
>> >
>> > (The same is true of SCHED_FIFO, that's a 'Global'-FIFO on the same
>> > partition scheme, it however does support sched_affinity, but using it
>> > gives 'interesting' schedulability results -- call it a historic
>> > accident).
>>
>> Hmm, I didn't realize that the deadline scheduler was global. But
>> ISTM requiring the use of "exclusive" to get this working is
>> unfortunate. What if a user wants two separate partitions, one using
>> CPUs 1 and 2 and the other using CPUs 3 and 4 (with 5 reserved for
>> non-RT stuff)?
>
> {1,2} {3,4} {5} seem exclusive, did I miss something? (other than that 5
> cpu parts are 'rare').
There's no overlap, so they're logically exclusive, but it avoids
needing the "cpu_exclusive" parameter. It always seemed confusing to
me that a setting on a child cgroup would strictly remove a resource
from the parent. (To be clear: I don't have any particularly strong
objection to cpu_exclusive. It just always seemed like a bit of a
hack that mostly duplicated what you could get by just setting the
cpusets appropriately throughout the hierarchy.)
>> > Note that related, but differently, we have the isolcpus boot parameter
>> > which creates single CPU partitions for all listed CPUs and gives the
>> > rest to the root cpuset. Ideally we'd kill this option given its a boot
>> > time setting (for something which is trivially to do at runtime).
>> >
>> > But this cannot be done, because that would mean we'd have to start with
>> > a !0 cpuset layout:
>> >
>> > '/'
>> > load_balance=0
>> > / \
>> > 'system' 'isolated'
>> > cpus=~isolcpus cpus=isolcpus
>> > load_balance=0
>> >
>> > And start with _everything_ in the /system group (inclding default IRQ
>> > affinities).
>> >
>> > Of course, that will break everything cgroup :-(
>> >
>>
>> I would actually *much* prefer this over the status quo. I'm tired of
>> my crappy, partially-working script that sits there and creates
>> exactly this configuration (minus the isolcpus part because I actually
>> want migration to work) on boot. (Actually, it could have two
>> automatic cgroups: /kernel and /init -- init and UMH would go in init
>> and kernel threads and such would go in /kernel. Userspace would be
>> able to request that a different cgroup be used for newly-created
>> kernel threads.)
>
> So there's a problem with sticking kernel threads (and esp. kthreadd)
> into !root groups. For example if you place it in a cpuset that doesn't
> have all cpus, then binding your shiny new kthread to a cpu will fail.
>
> You can fix that of course, and we used to do exactly that, but we kept
> running into 'fun' cases like that.
Blech. But may this *should* have that effect. I'm sick of random
kernel crap being scheduled on my RT CPUs and on the CPUs that I
intend to be kept forcibly idle.
>
> The unbound workqueue stuff is totally arbitrary borkage though, that
> can be made to work just fine, TJ didn't like it for some reason which I
> really cannot remember.
>
> Also, UMH?
User mode helper. Fortunately most users are gone now, but it still exists.
Powered by blists - more mailing lists