lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Date:   Mon, 3 Apr 2017 16:52:24 -0700 (PDT)
From:   Shivappa Vikas <vikas.shivappa@...el.com>
To:     Vikas Shivappa <vikas.shivappa@...ux.intel.com>
cc:     vikas.shivappa@...el.com, x86@...nel.org,
        linux-kernel@...r.kernel.org, hpa@...or.com, tglx@...utronix.de,
        mingo@...nel.org, peterz@...radead.org,
        "Shankar, Ravi V" <ravi.v.shankar@...el.com>, tony.luck@...el.com,
        "Yu, Fenghua" <fenghua.yu@...el.com>, h.peter.anvin@...el.com
Subject: Re: [PATCH] x86/cqm: Cqm3 Documentation


On Mon, 3 Apr 2017, Vikas Shivappa wrote:

> Explains the design for the interface

Explains the design for the new resctrl based cqm interface. A followup
with design document after the requirements for new cqm was reviewed :
https://marc.info/?l=linux-kernel&m=148891934720489

>
> Signed-off-by: Vikas Shivappa <vikas.shivappa@...ux.intel.com>
> ---
> Documentation/x86/intel_rdt_ui.txt | 210 ++++++++++++++++++++++++++++++++++---
> 1 file changed, 197 insertions(+), 13 deletions(-)
>
> diff --git a/Documentation/x86/intel_rdt_ui.txt b/Documentation/x86/intel_rdt_ui.txt
> index d918d26..46a2efd 100644
> --- a/Documentation/x86/intel_rdt_ui.txt
> +++ b/Documentation/x86/intel_rdt_ui.txt
> @@ -1,12 +1,13 @@
> -User Interface for Resource Allocation in Intel Resource Director Technology
> +User Interface for Resource Allocation and Monitoring in Intel Resource
> +Director Technology
>
> Copyright (C) 2016 Intel Corporation
>
> Fenghua Yu <fenghua.yu@...el.com>
> Tony Luck <tony.luck@...el.com>
>
> -This feature is enabled by the CONFIG_INTEL_RDT_A Kconfig and the
> -X86 /proc/cpuinfo flag bits "rdt", "cat_l3" and "cdp_l3".
> +This feature is enabled by the CONFIG_INTEL_RDT Kconfig and the
> +X86 /proc/cpuinfo flag bits "rdt", "cqm", "cat_l3" and "cdp_l3".
>
> To use the feature mount the file system:
>
> @@ -16,14 +17,20 @@ mount options are:
>
> "cdp": Enable code/data prioritization in L3 cache allocations.
>
> +The mount succeeds if either of allocation or monitoring is present.
> +Monitoring is enabled for each resource which has support in the
> +hardware. For more details on the behaviour of the interface during
> +monitoring and allocation, see resctrl group section.
>
> Info directory
> --------------
>
> The 'info' directory contains information about the enabled
> resources. Each resource has its own subdirectory. The subdirectory
> -names reflect the resource names. Each subdirectory contains the
> -following files:
> +names reflect the resource names.
> +
> +Each subdirectory contains the following files with respect to
> +allocation:
>
> "num_closids":  The number of CLOSIDs which are valid for this
> 	        resource. The kernel uses the smallest number of
> @@ -35,15 +42,36 @@ following files:
> "min_cbm_bits": The minimum number of consecutive bits which must be
> 		set when writing a mask.
>
> +Each subdirectory contains the following files with respect to
> +monitoring:
> +
> +"num_rmids":			The number of RMIDs which are valid for
> +				this resource.
> +
> +"mon_enabled":			Indicates if monitoring is enabled for
> +				the resource.
>
> -Resource groups
> ----------------
> +"max_threshold_occupancy":	This is specific to LLC_occupancy
> +				monitoring. provides an upper bound on
> +				the threshold and is measured in bytes
> +				because it's exposed to userland.
> +
> +Resource alloc and monitor groups (ALLOC_MON group)
> +---------------------------------------------------
> Resource groups are represented as directories in the resctrl file
> system. The default group is the root directory. Other groups may be
> created as desired by the system administrator using the "mkdir(1)"
> command, and removed using "rmdir(1)".
>
> -There are three files associated with each group:
> +User can allocate resources and monitor resources via these
> +resource groups created in the root directory.
> +
> +Note that the creation of new ALLOC_MON groups is only allowed when RDT
> +allocation is supported. This means user can still monitor the root
> +group when only RDT monitoring is supported.
> +
> +There are three files associated with each group with respect to
> +resource allocation:
>
> "tasks": A list of tasks that belongs to this group. Tasks can be
> 	added to a group by writing the task ID to the "tasks" file
> @@ -75,6 +103,56 @@ the CPU's group is used.
>
> 3) Otherwise the schemata for the default group is used.
>
> +There are three files associated with each group with respect to
> +resource monitoring:
> +
> +"data": A list of all the monitored resource data available to this
> +	group. This includes the monitored data for all the tasks in the
> +	'tasks' and the cpus in 'cpus' file. Each resource has its own
> +	line and format - see below for details the 'data' file
> +	description. The monitored data for
> +	the ALLOC_MON group is the sum of all the data for its sub MON
> +	groups.
> +
> +"mon_tasks": A directory where in user can create Resource monitor
> +	groups (MON groups). This will let user create a group to
> +	monitor a subset of tasks in the above 'tasks' file.
> +
> +Resource monitor groups (MON group)
> +-----------------------------------
> +
> +Resource monitor groups are directories inside the mon_tasks directory.
> +There is one mon_tasks directory inside every ALLOC_MON group including
> +the root group.
> +
> +MON group help user monitor a subset of tasks and cpus with in
> +the parent ALLOC_MON group.
> +
> +Each MON group has 3 files:
> +
> +"tasks": This behaves exactly as the 'tasks' file above in the ALLOC_MON
> +	group with the added restriction that only a task present in the
> +	parent ALLOC_MON group can be added and this automatically
> +	removes the task from the "tasks" file of any other MON group.
> +	When a task gets removed from parent ALLOC_MON group the task is
> +	removed from "tasks" file in the child MON group.
> +
> +"cpus": This behaves exactly as the 'cpus' file above in the ALLOC_MON
> +	group with the added restriction that only a cpu present in the
> +	parent ALLOC_MON group can be added and this automatically
> +	removes the task from the "cpus" file of any other MON group.
> +	When a cpu gets removed from parent ALLOC_MON group the cpu is
> +	removed from "cpus" file in the child MON group.
> +
> +"data": A list of all the monitored resource data available to
> +	this group. Each resource has its own line and format - see
> +	below for details in the 'data' file description.
> +
> +data files - general concepts
> +-----------------------------
> +Each line in the file describes one resource. The line starts with
> +the name of the resource, followed by monitoring data collected
> +in each of the instances/domains of that resource on the system.
>
> Schemata files - general concepts
> ---------------------------------
> @@ -107,21 +185,26 @@ and 0xA are not.  On a system with a 20-bit mask each bit represents 5%
> of the capacity of the cache. You could partition the cache into four
> equal parts with masks: 0x1f, 0x3e0, 0x7c00, 0xf8000.
>
> -
> -L3 details (code and data prioritization disabled)
> ---------------------------------------------------
> +L3 'schemata' file format (code and data prioritization disabled)
> +----------------------------------------------------------------
> With CDP disabled the L3 schemata format is:
>
> 	L3:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
>
> -L3 details (CDP enabled via mount option to resctrl)
> -----------------------------------------------------
> +L3 'schemata' file format (CDP enabled via mount option to resctrl)
> +------------------------------------------------------------------
> When CDP is enabled L3 control is split into two separate resources
> so you can specify independent masks for code and data like this:
>
> 	L3data:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
> 	L3code:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
>
> +L3 'data' file format (data)
> +---------------------------
> +When monitoring is enabled for L3 occupancy the 'data' file format is:
> +
> +	L3:<cache_id0>=<llc_occupancy>;<cache_id1>=<llc_occupancy>;...
> +
> L2 details
> ----------
> L2 cache does not support code and data prioritization, so the
> @@ -129,6 +212,8 @@ schemata format is always:
>
> 	L2:<cache_id0>=<cbm>;<cache_id1>=<cbm>;...
>
> +Examples for RDT allocation usage:
> +
> Example 1
> ---------
> On a two socket machine (one L3 cache per socket) with just four bits
> @@ -212,3 +297,102 @@ Finally we move core 4-7 over to the new group and make sure that the
> kernel and the tasks running there get 50% of the cache.
>
> # echo C0 > p0/cpus
> +
> +Examples for RDT Monitoring usage:
> +
> +Example 1 (Monitor CTRL_MON group and subset of tasks in CTRL_MON group)
> +---------
> +On a two socket machine (one L3 cache per socket) with just four bits
> +for cache bit masks
> +
> +# mount -t resctrl resctrl /sys/fs/resctrl
> +# cd /sys/fs/resctrl
> +# mkdir p0 p1
> +# echo "L3:0=3;1=c" > /sys/fs/resctrl/p0/schemata
> +# echo "L3:0=3;1=3" > /sys/fs/resctrl/p1/schemata
> +# echo 5678 > p1/tasks
> +# echo 5679 > p1/tasks
> +
> +The default resource group is unmodified, so we have access to all parts
> +of all caches (its schemata file reads "L3:0=f;1=f").
> +
> +Tasks that are under the control of group "p0" may only allocate from the
> +"lower" 50% on cache ID 0, and the "upper" 50% of cache ID 1.
> +Tasks in group "p1" use the "lower" 50% of cache on both sockets.
> +
> +Create monitor groups
> +
> +# cd /sys/fs/resctrl/p1/mon_tasks
> +# mkdir m11 m12
> +# echo 5678 > m11/tasks
> +# echo 5679 > m12/tasks
> +
> +fetch data (data shown in bytes)
> +
> +# cat m11/tasks_data
> +L3:0=16234000;1=14789000
> +# cat m12/tasks_data
> +L3:0=14234000;1=16789000
> +
> +The parent group shows the aggregated data.
> +
> +# cat /sys/fs/resctrl/p1/tasks_data
> +L3:0=31234000;1=31789000
> +
> +Example 2 (Monitor a task from its creation)
> +---------
> +On a two socket machine (one L3 cache per socket)
> +
> +# mount -t resctrl resctrl /sys/fs/resctrl
> +# cd /sys/fs/resctrl
> +# mkdir p0 p1
> +
> +An RMID is allocated to the group once its created and hence the <cmd>
> +below is monitored from its creation.
> +
> +# echo $$ > /sys/fs/resctrl/p1/tasks
> +# echo <cmd> > /sys/fs/resctrl/p1/tasks
> +
> +Fetch the data
> +
> +# cat /sys/fs/resctrl/p1/tasks_data
> +L3:0=31234000;1=31789000
> +
> +Example 3 (Monitor without CAT support or before creating CAT groups)
> +---------
> +
> +Assume a system like HSW has only CQM and no CAT support. In this case
> +the resctrl will still mount but cannot create CTRL_MON directories.
> +But user can create different MON groups within the root group thereby
> +able to monitor all tasks including kernel threads.
> +
> +This can also be used to profile jobs cache size footprint before being
> +able to allocate them different allocation groups.
> +
> +# mount -t resctrl resctrl /sys/fs/resctrl
> +# cd /sys/fs/resctrl
> +
> +# echo $$ > /sys/fs/resctrl/p1/tasks
> +# echo <cmd> > /sys/fs/resctrl/p1/tasks
> +
> +# cat /sys/fs/resctrl/p1/tasks_data
> +L3:0=31234000;1=31789000
> +
> +Example 4 (Monitor real time tasks)
> +-----------------------------------
> +
> +A single socket system which has real time tasks running on cores 4-7
> +and non real time tasks on other cpus. We want to monitor the cache
> +occupancy of the real time threads on these cores.
> +
> +# mount -t resctrl resctrl /sys/fs/resctrl
> +# cd /sys/fs/resctrl
> +# mkdir p1
> +
> +Move the cpus 4-7 over to p1
> +# echo C0 > p0/cpus
> +
> +View the llc occupancy snapshot
> +
> +# cat /sys/fs/resctrl/p1/tasks_data
> +L3:0=11234000
> -- 
> 1.9.1
>
>

Powered by blists - more mailing lists

Powered by Openwall GNU/*/Linux Powered by OpenVZ