linux-kernel - Re: [summary] Re: [PATCH 3/9] x86/intel_rdt: Cache Allocation documentation and cgroup usage guide

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20150731183803.GA29321@amt.cnet>
Date:	Fri, 31 Jul 2015 15:38:03 -0300
From:	Marcelo Tosatti <mtosatti@...hat.com>
To:	Vikas Shivappa <vikas.shivappa@...el.com>
Cc:	"Auld, Will" <will.auld@...el.com>,
	Vikas Shivappa <vikas.shivappa@...ux.intel.com>,
	"linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>,
	"x86@...nel.org" <x86@...nel.org>, "hpa@...or.com" <hpa@...or.com>,
	"tglx@...utronix.de" <tglx@...utronix.de>,
	"mingo@...nel.org" <mingo@...nel.org>,
	"tj@...nel.org" <tj@...nel.org>,
	"peterz@...radead.org" <peterz@...radead.org>,
	"Fleming, Matt" <matt.fleming@...el.com>,
	"Williamson, Glenn P" <glenn.p.williamson@...el.com>,
	"Juvva, Kanaka D" <kanaka.d.juvva@...el.com>
Subject: Re: [summary] Re: [PATCH 3/9] x86/intel_rdt: Cache Allocation
 documentation and cgroup usage guide

On Fri, Jul 31, 2015 at 09:41:58AM -0700, Vikas Shivappa wrote:
> 
> To summarize  the ever growing thread :
> 
> 1. the rdt_cgroup can be used to configure exclusive cache bitmaps
> for the child nodes which can be used for the scenarios which
> Marcello mentions.
> 
> simle examples which were mentioned :
> max bitmask length : 16 . hence full mask is 0xffff
> groupx_realtime - 0xff .
> group2_systemtraffic - 0xf. : put a lot of tasks from root node to
> here or which ever is offending and thrashing.
> groupy_<mytraffic> - 0x0f
> 
> Now the groupx has its own area of cache that can used by the
> realtime/(specific scenario) apps. Similarly configure any groupy.
> 
> 2. Can the maps can let you specify which cache ways ways the cache
> is allocated ? - No , this is implementation specific as mentioned
> in the SDM. So when we configure a mask , you really dont know which
> ways or which exact lines are used on which SKUs .. We may not see
> any use case as well which is needed for apps to allocate cache in
> specific areas and the h/w does not support this as well.

Ok, can you comment whether the userspace interface proposed addresses
all your use cases ?

> 3. Letting the user specify size in bytes instead of bitmap : we
> have already gone through this discussion in older versions. The
> user can simply check the size of the total cache and understand
> what map could be what size. I dont see a special need to specify an
> interface to enter the cache in bytes and then round off - user
> could instead use the roundoff values before hand or iow it
> automatically does when he specifies the bitmask.

When you move from processor A with CBM bitmask format X to hardware B
with CBM bitmask format Y, and the formats Y and X are different, you
have to manually adjust the format.

Please reply to the userspace proposal, the problem is very explicit
there.

> ex: find cache size from /proc/cpuinfo. - say 20MB
> bitmask max - 0xfffff.
> 
> This means the roundoff(chunk) size supported is only 1MB , so when
> you specify the mask say 0x3(2MB) thats already taken care of.
> Same applies to percentage - the masks automatically round off the percentage.
> 
> Please note that this is quite different from the way we can
> allocate memory in bytes and needs to be treated differently given
> that the hardware provides interface in a particular way.
> 
> 4. Letting the kernel automatically extend the bitmap may affect a
> lot of other things 

Lets talk about them. What other things?

> and will need a lot of heuristics - note that we
> have overlapping masks.

I proposed a way to avoid heuristics by exposing whether the cgroup is 
"expandable" or not and asked your input.

We really do not want to waste cache if we can avoid it.

> This interface lets the super-user control
> the cache allocation and it may be very confusing for the user if he
> has allocated a cache mask and suddenly from under the floor the
> kernel changes it.

Agree.

> 
> Thanks,
> Vikas
> 
> 
> On Fri, 31 Jul 2015, Marcelo Tosatti wrote:
> 
> >On Thu, Jul 30, 2015 at 04:03:07PM -0700, Vikas Shivappa wrote:
> >>
> >>
> >>On Thu, 30 Jul 2015, Marcelo Tosatti wrote:
> >>
> >>>On Thu, Jul 30, 2015 at 10:47:23AM -0700, Vikas Shivappa wrote:
> >>>>
> >>>>
> >>>>Marcello,
> >>>>
> >>>>
> >>>>On Wed, 29 Jul 2015, Marcelo Tosatti wrote:
> >>>>>
> >>>>>How about this:
> >>>>>
> >>>>>desiredclos (closid  p1  p2  p3 p4)
> >>>>>	     1       1   0   0  0
> >>>>>	     2	     0	 0   0  1
> >>>>>	     3	     0   1   1  0
> >>>>
> >>>>#1 Currently in the rdt cgroup , the root cgroup always has all the
> >>>>bits set and cant be changed (because the cgroup hierarchy would by
> >>>>default make this to have all bits as all the children need to have
> >>>>a subset of the root's bitmask). So if the user creates a cgroup and
> >>>>not put any task in it , the tasks in the root cgroup could be still
> >>>>using that part of the cache. Thats the reason i say we can have
> >>>>really 'exclusive' masks.
> >>>>
> >>>>Or in other words - there is always a desired clos (0) which has all
> >>>>parts set which acts like a default pool.
> >>>>
> >>>>Also the parts can overlap.  Please apply this for all the below
> >>>>comments which will change the way they work.
> >>>
> >>>
> >>>>
> >>>>>
> >>>>>p means part.
> >>>>
> >>>>I am assuming p = (a contiguous cache capacity bit mask)
> >>>>
> >>>>>closid 1 is a exclusive cgroup.
> >>>>>closid 2 is a "cache hog" class.
> >>>>>closid 3 is "default closid".
> >>>>>
> >>>>>Desiredclos is what user has specified.
> >>>>>
> >>>>>Transition 1: desiredclos --> effectiveclos
> >>>>>Clean all bits of unused closid's
> >>>>>(that must be updated whenever a
> >>>>>closid1 cgroup goes from empty->nonempty
> >>>>>and vice-versa).
> >>>>>
> >>>>>effectiveclos (closid  p1  p2  p3 p4)
> >>>>>	       1       0   0   0  0
> >>>>>	       2       0   0   0  1
> >>>>>	       3       0   1   1  0
> >>>>
> >>>>>
> >>>>>Transition 2: effectiveclos --> expandedclos
> >>>>>expandedclos (closid  p1  p2  p3 p4)
> >>>>>	       1       0   0   0  0
> >>>>>	       2       0   0   0  1
> >>>>>	       3       1   1   1  0
> >>>>>Then you have different inplacecos for each
> >>>>>CPU (see pseudo-code below):
> >>>>>
> >>>>>On the following events.
> >>>>>
> >>>>>- task migration to new pCPU:
> >>>>>- task creation:
> >>>>>
> >>>>>	id = smp_processor_id();
> >>>>>	for (part = desiredclos.p1; ...; part++)
> >>>>>		/* if my cosid is set and any other
> >>>>>	   	   cosid is clear, for the part,
> >>>>>		   synchronize desiredclos --> inplacecos */
> >>>>>		if (part[mycosid] == 1 &&
> >>>>>		    part[any_othercosid] == 0)
> >>>>>			wrmsr(part, desiredclos);
> >>>>>
> >>>>
> >>>>Currently the root cgroup would have all the bits set which will act
> >>>>like a default cgroup where all the otherwise unused parts (assuming
> >>>>they are a set of contiguous cache capacity bits) will be used.
> >>>
> >>>Right, but we don't want to place tasks in there in case one cgroup
> >>>wants exclusive cache access.
> >>>
> >>>So whenever you want an exclusive cgroup you'd do:
> >>>
> >>>create cgroup-exclusive; reserve desired part of the cache
> >>>for it.
> >>>create cgroup-default; reserved all cache minus that of cgroup-exclusive
> >>>for it.
> >>>
> >>>place tasks that belong to cgroup-exclusive into it.
> >>>place all other tasks (including init) into cgroup-default.
> >>>
> >>>Is that right?
> >>
> >>Yes you could do that.
> >>
> >>You can create cgroups to have masks which are exclusive in todays
> >>implementation, just that you could also created more cgroups to
> >>overlap the masks again.. iow we dont have an exclusive flag for the
> >>cgroup mask.
> >>Is that a common use case in the server environment that you need to
> >>prevent other cgroups from using a certain mask ? (since the root
> >>user should control these allocations .. he should know?)
> >
> >Yes, there are two known use-cases that have this characteristic:
> >
> >1) High performance numeric application which has been optimized
> >to a certain fraction of the cache.
> >
> >2) Low latency application in multi-application OS.
> >
> >For both cases exclusive cache access is wanted.
> >
> >
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/