[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <alpine.DEB.2.10.1508231153310.13335@vshiva-Udesk>
Date: Sun, 23 Aug 2015 11:56:22 -0700 (PDT)
From: Vikas Shivappa <vikas.shivappa@...el.com>
To: Vikas Shivappa <vikas.shivappa@...ux.intel.com>
cc: vikas.shivappa@...el.com, x86@...nel.org,
linux-kernel@...r.kernel.org, hpa@...or.com, tglx@...utronix.de,
mingo@...nel.org, peterz@...radead.org,
Matt Fleming <matt.fleming@...el.com>,
"Auld, Will" <will.auld@...el.com>,
"Williamson, Glenn P" <glenn.p.williamson@...el.com>,
"Juvva, Kanaka D" <kanaka.d.juvva@...el.com>, tony.luck@...el.com,
Marcelo Tosatti <mtosatti@...hat.com>
Subject: Re: [DESIGN] x86/intel_rdt: Intel Cache Allocation interface
proposal
+Tony and acknowledging him..
On Sun, 23 Aug 2015, Vikas Shivappa wrote:
> This document tries to propose alternative interface for the Intel cache
> allocation compared to the cgroup interface in the current patchset -
> http://marc.info/?l=linux-kernel&m=143889814520578
>
> More info about cache allocation can be found in Intel SDM june 2015
> volume 3, section 17.16.
>
> Design overview:
> ---------------
>
> OS maintains a mapping between task_struct and the class of service it
> belongs to. This is done by adding a new field 'closid' in the
> task_struct. Each closid is mapped to unique capacity bit mask(cbm)
> which indicates the cache capacity associated to the closid.
>
> During scheduing, kernel writes this closid into IA32_PQOS_MSR to
> indicate the hardware as to what Class of service(CLOS) the task belongs
> to.
>
> It makes following changes to the current patch series :
> - Add kernel mode API to control cache allocations from with in the OS.
> - we dont use cgroup and instead expose controls through using sysfs in
> /sys/kernel directory for the administrator to configure the cache
> allocations.
> - And optionally it also adds capabilities to add a control where
> process can change the cache allocation under the defined allocations
> by administrator.
>
> The usecases targeted is mainly server clusters, cloud and container
> based services and HPC workloads. Users of cloud or containers would
> get a VM/container to run the workloads and its most appropriate to
> setup the static cache allocations for these units like VM/Containers.
> For containers, many of the container based products like
> Rancher/stackengine etc are docker based and allocate/manage
> resources through a centralized orchestration/deployment tool.
> Containers are quickly picking up in usage given the ease of deployment
> of new containers and the scaling. These cache alloc interfaces try to
> build a framework so that such use cases like cloud and container based
> can easily adapt.
>
> Apps are restricted to self control the cache allocations as cache is
> orders of magnitude scarce resource when we compare to other resources
> like memory and will quickly run out of the resource if the apps
> naturally try to use more of the resource to increase their own
> performance.
>
> kernel mode API:
> ---------------
>
> enum cache_resource{
> l3_shared,
> };
>
> struct cache_alloc_config
> {
> u32 max_cbm;
> u32 max_closid;
> unsigned long cache_size;
> int cdp_mode;
> };
>
> struct clos_cbm_table {
> unsigned long l3_cbm;
> unsigned int clos_refcnt;
> };
>
> void cache_alloc_get_info(enum cache_resource cr, struct
> cache_alloc_config &config);
>
> This returns the cache allocation configuration information along with
> the cache size. Additionational capabilities can be added for example
> the current mode whether code data prioritization(supporting both
> icache/dcache or legacy cache alloc).
>
> int cache_alloc_set_cdpmode(bool setcdp);
>
> By default cdp(code data prioritization which supports allocation of
> code and data seperately instead of common cache allocation) is not
> enabled and can be set/reset with this API. Enabling cdp would reset all
> the capacity bit masks and reduce the number of CLOSids to half.
> With cdp enabled the cbm can be extended to represent data and code
> capacity mask (by having two u32).
>
> void cache_alloc_get_cbm_table( struct clos_cbm_table *cctable, int
> size);
> Returns the mapping of the current closids to the capacity bit masks.
>
> u32 cache_alloc_set_cbm( u32 preferred_closid, u32 cbm);
>
> This reconfigures the capacity bitmask(cbm) for a preferred closid. If
> the cbm is already present in the table, that closid is returned. That
> way each unique cbm has one closid.
>
> sysfs interface
> ---------------
>
> This exposes files changeble by root in /sys/kernel/cache_alloc
> directory.
>
> clos_cbm_table :
> Reading - this shows the max_cbm and the current snapshot of the
> clos_cbm table.
> writing - user can write the 'preferred closid' 'cbm' to change the
> existing entry in the set of CLOS configs.
> If user writes a bitmask that already
> exists it outputs indicating what closid has the cbm.
> $ echo <closid> <cbm> > /sys/kernel/cache_alloc/clos_cbm_table
>
> Alternatively , instead of clos_cbm_table a directory for each clos
> would be created with a file cbm in each directory.
>
> add_task :
> write only: Can change the closid of any task by writing the 'pid'
> 'closid'. eg:
> $ echo <pid> <closid> > /sys/kernel/cache_alloc/add_task
>
> threshold_clos : Can have two values 'lowest', 'all'. default to lowest.
> When it lowest , a process can self change its closid to a different
> closid but the new closid has to have the lowest capacity bitmask among
> all the bitmasks. When its 'all' the process can change to any closid.
> the interface is indicated below.
>
> cdp_enable : takes 1/0 and by default is 0. Used to set cdp mode.
>
> $ ls /sys/kernel/cache_alloc
> add_task
> threshold_clos
> cdp_enable
> clos0/
> clos1/
> ...
> closn/
>
>
> The closid of the task can be viewed in the /proc/<tid>/ stats.
> The tasks would have closid 0 by default and would inherit parents
> closid upon fork.
>
> prctl/ syscall interface for process to change cache alloc
> ----------------------------------------------------------
>
> This lets a process change its own cache allocation. However the amount
> of change that can be done is limited. This is because L3 cache is a
> very limited/scarce resource and can easily be exhausted by the first
> few processes requesting more amount of cache. And this also lets one
> centralized entity or a system-controlled mechanism which can be used
> only by administrator to have a higher control in deciding the cache
> allocation which is more useful in the scenarios described above.
>
> struct cat_config {
> u32 max_cbm;
> u32 max_clos;
> unsigned long chunk_size;
> int any_clos_allowed;
> };
>
> void cat_get_current_config(struct cat_config &config, struct
> clos_cbm_table &cctable);
>
> This returns the max clos and cbm length and the current mappings of the
> closid and the capacity masks. It also returns the chunk_size which
> specifies the size of cache capacity that corresponds to one bit of cbm.
> any_clos_allowed will be true if the threshold_clos is set
> to 'any'.
>
> prctl(PR_SET_CLOSID, <new_closid>, ... );
>
> Cache can be allocated in terms of bytes or percentages using this
> interface. One can calculate the chunk size from the APIs and then
> convert the size required to mask easily by using bitmask length = (size
> required/ chunk size). Also the bitmask gives the flexibility to
> have exclusive, completely overlapping or partially overlapping cache
> areas which can be adjusted based on the requirements of the workloads.
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Please read the FAQ at http://www.tux.org/lkml/
Powered by blists - more mailing lists