[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151104094227.5aafdf2c@redhat.com>
Date: Wed, 4 Nov 2015 09:42:27 -0500
From: Luiz Capitulino <lcapitulino@...hat.com>
To: Fenghua Yu <fenghua.yu@...el.com>
Cc: "H Peter Anvin" <hpa@...or.com>, "Ingo Molnar" <mingo@...hat.com>,
"Thomas Gleixner" <tglx@...utronix.de>,
"Peter Zijlstra" <peterz@...radead.org>,
"linux-kernel" <linux-kernel@...r.kernel.org>,
"x86" <x86@...nel.org>,
"Vikas Shivappa" <vikas.shivappa@...ux.intel.com>,
Marcelo Tosatti <mtosatti@...hat.com>, tj@...nel.org,
riel@...hat.com
Subject: Re: [PATCH V15 00/11] x86: Intel Cache Allocation Technology
Support
On Thu, 1 Oct 2015 23:09:34 -0700
Fenghua Yu <fenghua.yu@...el.com> wrote:
> This series has some preparatory patches and Intel cache allocation
> support.
Ping? What's the status of this series?
We badly need this series for KVM-RT workloads. I did try it and it
seems to work but, apart from small fixable issues which I'll reply
to specific patches to point out, there are some design issues which
I need some clarification. They are in order of relevance:
o Cache reservations are global to all NUMA nodes
CAT is mostly intended for real-time and high performance
computing. For both of them the most common setup is to
pin your threads to specific cores on a specific NUMA node.
So, suppose I have two HPC threads pinned to specific cores
on node1. I want to reserve 80% of the L3 cache to those
threads. With current patches I'd do this:
1. Create a "all-tasks" cgroup which can only access 20% of
the cache
2. Create a "hpc" cgroup which can access 80% of the cache
3. Move my HPC threads to "hpc" and all the other threads to
"all-tasks"
This has the intended behavior on node1: the "hpc" threads
will write into 80% of the L3 cache and any "all-tasks" threads
executing there will only write into 20% of the cache.
However, this is also true for node0! So, the "all-tasks"
threads can only write into 20% of the cache in node0 even
though "hpc" threads will never execute there.
Is this intended by design? Like, is this a hardware limitation
(given that the IA32_L3_MASK_n MSRs are global anyways) or maybe
a way to enforce cache coherence?
I was wondering if we could have masks per NUMA node, where
they are applied to processes whenever they migrate among
NUMA nodes.
o How does this feature apply to kernel threads?
I'm just unable to move kernel threads out of the root
cgroup. This means that kernel threads can always write
into all cache no matter what the reservation scheme is.
Is this intended by design? Why? Unless I'm missing
something, reservations could and should be applied to
kernel threads as well.
o You can't change the root cgroup's CBM
I can understand this makes the implementation a lot simpler.
However, the reality is that there are way too little CBMs
and loosing one for the root group seems like a waste.
Can we change this or is there strong reasons not to do so?
o cgroups hierarchy is limited by the number of CBMs
Today on my Haswell system, this means that I can only have 3
directories in my cgroups hierarchy. If the number of CBMs
are expected to grow in next processors, then I think having
this feature as cgroups makes sense. However, if we're still
going to be this limited in terms of directory structure, then
it seems a bit overkill to me to have this as cgroups
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists