[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150731183803.GA29321@amt.cnet>
Date: Fri, 31 Jul 2015 15:38:03 -0300
From: Marcelo Tosatti <mtosatti@...hat.com>
To: Vikas Shivappa <vikas.shivappa@...el.com>
Cc: "Auld, Will" <will.auld@...el.com>,
Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
"x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
"tglx@...utronix.de" <tglx@...utronix.de>,
"mingo@...nel.org" <mingo@...nel.org>,
"tj@...nel.org" <tj@...nel.org>,
"peterz@...radead.org" <peterz@...radead.org>,
"Fleming, Matt" <matt.fleming@...el.com>,
"Williamson, Glenn P" <glenn.p.williamson@...el.com>,
"Juvva, Kanaka D" <kanaka.d.juvva@...el.com>
Subject: Re: [summary] Re: [PATCH 3/9] x86/intel_rdt: Cache Allocation
documentation and cgroup usage guide
On Fri, Jul 31, 2015 at 09:41:58AM -0700, Vikas Shivappa wrote:
>
> To summarize the ever growing thread :
>
> 1. the rdt_cgroup can be used to configure exclusive cache bitmaps
> for the child nodes which can be used for the scenarios which
> Marcello mentions.
>
> simle examples which were mentioned :
> max bitmask length : 16 . hence full mask is 0xffff
> groupx_realtime - 0xff .
> group2_systemtraffic - 0xf. : put a lot of tasks from root node to
> here or which ever is offending and thrashing.
> groupy_<mytraffic> - 0x0f
>
> Now the groupx has its own area of cache that can used by the
> realtime/(specific scenario) apps. Similarly configure any groupy.
>
> 2. Can the maps can let you specify which cache ways ways the cache
> is allocated ? - No , this is implementation specific as mentioned
> in the SDM. So when we configure a mask , you really dont know which
> ways or which exact lines are used on which SKUs .. We may not see
> any use case as well which is needed for apps to allocate cache in
> specific areas and the h/w does not support this as well.
Ok, can you comment whether the userspace interface proposed addresses
all your use cases ?
> 3. Letting the user specify size in bytes instead of bitmap : we
> have already gone through this discussion in older versions. The
> user can simply check the size of the total cache and understand
> what map could be what size. I dont see a special need to specify an
> interface to enter the cache in bytes and then round off - user
> could instead use the roundoff values before hand or iow it
> automatically does when he specifies the bitmask.
When you move from processor A with CBM bitmask format X to hardware B
with CBM bitmask format Y, and the formats Y and X are different, you
have to manually adjust the format.
Please reply to the userspace proposal, the problem is very explicit
there.
> ex: find cache size from /proc/cpuinfo. - say 20MB
> bitmask max - 0xfffff.
>
> This means the roundoff(chunk) size supported is only 1MB , so when
> you specify the mask say 0x3(2MB) thats already taken care of.
> Same applies to percentage - the masks automatically round off the percentage.
>
> Please note that this is quite different from the way we can
> allocate memory in bytes and needs to be treated differently given
> that the hardware provides interface in a particular way.
>
> 4. Letting the kernel automatically extend the bitmap may affect a
> lot of other things
Lets talk about them. What other things?
> and will need a lot of heuristics - note that we
> have overlapping masks.
I proposed a way to avoid heuristics by exposing whether the cgroup is
"expandable" or not and asked your input.
We really do not want to waste cache if we can avoid it.
> This interface lets the super-user control
> the cache allocation and it may be very confusing for the user if he
> has allocated a cache mask and suddenly from under the floor the
> kernel changes it.
Agree.
>
> Thanks,
> Vikas
>
>
> On Fri, 31 Jul 2015, Marcelo Tosatti wrote:
>
> >On Thu, Jul 30, 2015 at 04:03:07PM -0700, Vikas Shivappa wrote:
> >>
> >>
> >>On Thu, 30 Jul 2015, Marcelo Tosatti wrote:
> >>
> >>>On Thu, Jul 30, 2015 at 10:47:23AM -0700, Vikas Shivappa wrote:
> >>>>
> >>>>
> >>>>Marcello,
> >>>>
> >>>>
> >>>>On Wed, 29 Jul 2015, Marcelo Tosatti wrote:
> >>>>>
> >>>>>How about this:
> >>>>>
> >>>>>desiredclos (closid p1 p2 p3 p4)
> >>>>> 1 1 0 0 0
> >>>>> 2 0 0 0 1
> >>>>> 3 0 1 1 0
> >>>>
> >>>>#1 Currently in the rdt cgroup , the root cgroup always has all the
> >>>>bits set and cant be changed (because the cgroup hierarchy would by
> >>>>default make this to have all bits as all the children need to have
> >>>>a subset of the root's bitmask). So if the user creates a cgroup and
> >>>>not put any task in it , the tasks in the root cgroup could be still
> >>>>using that part of the cache. Thats the reason i say we can have
> >>>>really 'exclusive' masks.
> >>>>
> >>>>Or in other words - there is always a desired clos (0) which has all
> >>>>parts set which acts like a default pool.
> >>>>
> >>>>Also the parts can overlap. Please apply this for all the below
> >>>>comments which will change the way they work.
> >>>
> >>>
> >>>>
> >>>>>
> >>>>>p means part.
> >>>>
> >>>>I am assuming p = (a contiguous cache capacity bit mask)
> >>>>
> >>>>>closid 1 is a exclusive cgroup.
> >>>>>closid 2 is a "cache hog" class.
> >>>>>closid 3 is "default closid".
> >>>>>
> >>>>>Desiredclos is what user has specified.
> >>>>>
> >>>>>Transition 1: desiredclos --> effectiveclos
> >>>>>Clean all bits of unused closid's
> >>>>>(that must be updated whenever a
> >>>>>closid1 cgroup goes from empty->nonempty
> >>>>>and vice-versa).
> >>>>>
> >>>>>effectiveclos (closid p1 p2 p3 p4)
> >>>>> 1 0 0 0 0
> >>>>> 2 0 0 0 1
> >>>>> 3 0 1 1 0
> >>>>
> >>>>>
> >>>>>Transition 2: effectiveclos --> expandedclos
> >>>>>expandedclos (closid p1 p2 p3 p4)
> >>>>> 1 0 0 0 0
> >>>>> 2 0 0 0 1
> >>>>> 3 1 1 1 0
> >>>>>Then you have different inplacecos for each
> >>>>>CPU (see pseudo-code below):
> >>>>>
> >>>>>On the following events.
> >>>>>
> >>>>>- task migration to new pCPU:
> >>>>>- task creation:
> >>>>>
> >>>>> id = smp_processor_id();
> >>>>> for (part = desiredclos.p1; ...; part++)
> >>>>> /* if my cosid is set and any other
> >>>>> cosid is clear, for the part,
> >>>>> synchronize desiredclos --> inplacecos */
> >>>>> if (part[mycosid] == 1 &&
> >>>>> part[any_othercosid] == 0)
> >>>>> wrmsr(part, desiredclos);
> >>>>>
> >>>>
> >>>>Currently the root cgroup would have all the bits set which will act
> >>>>like a default cgroup where all the otherwise unused parts (assuming
> >>>>they are a set of contiguous cache capacity bits) will be used.
> >>>
> >>>Right, but we don't want to place tasks in there in case one cgroup
> >>>wants exclusive cache access.
> >>>
> >>>So whenever you want an exclusive cgroup you'd do:
> >>>
> >>>create cgroup-exclusive; reserve desired part of the cache
> >>>for it.
> >>>create cgroup-default; reserved all cache minus that of cgroup-exclusive
> >>>for it.
> >>>
> >>>place tasks that belong to cgroup-exclusive into it.
> >>>place all other tasks (including init) into cgroup-default.
> >>>
> >>>Is that right?
> >>
> >>Yes you could do that.
> >>
> >>You can create cgroups to have masks which are exclusive in todays
> >>implementation, just that you could also created more cgroups to
> >>overlap the masks again.. iow we dont have an exclusive flag for the
> >>cgroup mask.
> >>Is that a common use case in the server environment that you need to
> >>prevent other cgroups from using a certain mask ? (since the root
> >>user should control these allocations .. he should know?)
> >
> >Yes, there are two known use-cases that have this characteristic:
> >
> >1) High performance numeric application which has been optimized
> >to a certain fraction of the cache.
> >
> >2) Low latency application in multi-application OS.
> >
> >For both cases exclusive cache access is wanted.
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists