lists.openwall.net   lists  /  announce  owl-users  owl-dev  john-users  john-dev  passwdqc-users  yescrypt  popa3d-users  /  oss-security  kernel-hardening  musl  sabotage  tlsify  passwords  /  crypt-dev  xvendor  /  Bugtraq  Full-Disclosure  linux-kernel  linux-netdev  linux-ext4  linux-hardening  linux-cve-announce  PHC 
Open Source and information security mailing list archives
 
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Date:	Mon, 16 Nov 2015 23:01:43 -0200
From:	Marcelo Tosatti <mtosatti@...hat.com>
To:	Peter Zijlstra <peterz@...radead.org>
Cc:	Luiz Capitulino <lcapitulino@...hat.com>,
	Thomas Gleixner <tglx@...utronix.de>,
	Vikas Shivappa <vikas.shivappa@...el.com>,
	Tejun Heo <tj@...nel.org>, Yu Fenghua <fenghua.yu@...el.com>,
	linux-kernel@...r.kernel.org
Subject: Re: [PATCH RFC] ioctl based CAT interface

On Mon, Nov 16, 2015 at 02:39:03PM -0200, Marcelo Tosatti wrote:
> On Mon, Nov 16, 2015 at 10:07:56AM +0100, Peter Zijlstra wrote:
> > On Fri, Nov 13, 2015 at 03:33:04PM -0200, Marcelo Tosatti wrote:
> > > On Fri, Nov 13, 2015 at 05:51:00PM +0100, Peter Zijlstra wrote:
> > > > On Fri, Nov 13, 2015 at 02:39:33PM -0200, Marcelo Tosatti wrote:
> > > > > + * 	* one tcrid entry can be in different locations 
> > > > > + * 	  in different sockets.
> > > > 
> > > > NAK on that without cpuset integration.
> > > > 
> > > > I do not want freely migratable tasks having radically different
> > > > performance profiles depending on which CPU they land.
> > > 
> > > Ok, so, configuration:
> > > 
> > > 
> > > Socket-1				Socket-2
> > > 
> > > pinned thread-A with			100% L3 free
> > > 80% of L3 
> > > reserved
> > > 
> > > 
> > > So it is a problem if a thread running on socket-2 is scheduled to 
> > > socket-1 because performance is radically different, fine.
> > > 
> > > Then one way to avoid that is to not allow freely migratable tasks
> > > to move to Socket-1. Fine.
> > > 
> > > Then you want to use cpusets for that.
> > > 
> > > Can you fill in the blanks what is missing here?
> > 
> > I'm still not seeing what the problem with CAT-cgroup is.
> > 
> > /cgroups/cpuset/
> >   socket-1/cpus = $socket-1
> >            tasks = $thread-A
> > 
> >   socket-2/cpus = $socket-2
> >            tasks = $thread-B
> > 
> > /cgroups/cat/
> >   group-A/bitmap = 0x3F / 0xFF
> >   group-A/tasks = $thread-A
> > 
> >   group-B/bitmap = 0xFF / 0xFF
> >   group-B/tasks = $thread-B
> > 
> > 
> > That gets you thread-A on socket-1 with 6/8 of the L3 and thread-B on
> > socket-2 with 8/8 of the L3.
> 
> Going that route, might as well expose the region shared with HW
> to userspace and let userspace handle the problem of contiguous free regions,
> which means the cgroups bitmask maps one-to-one to HW bitmap.
> 
> All is necessary then is to modify the Intel patches to
> 
> 1) Support bitmaps per socket.

Consider the following scenario, one server with two sockets:

	socket-1 			socket-2
[ [***]	 	    ]		[	          [***]	]
	L3 cache bitmap		    L3 cache bitmap

[*] refers to the region shared with HW, as reported by CPUID (read the
Intel documentation). 

socket-1.shared_region_with_hw = [bit 2, bit 5]
socket-2.shared_region_with_hw = [bit 16, bit 18]

Given that your application is critical, you do not want it to share any
reservation with HW. I was informed that there is no guarantee these
regions end up in the same location for different sockets. Lets say you
need 15 bits of reservation, and the total is 20 bits. One possibility would be:

socket-1.reservation = [bit 5, bit 15]
socket-2.reservation = [bit 1, bit 15]

For the current Intel CAT patchset, this restriction exists:

static int cbm_validate_rdt_cgroup(struct intel_rdt *ir, unsigned long
cbmvalue)
{
        struct cgroup_subsys_state *css;
        struct intel_rdt *par, *c;
        unsigned long cbm_tmp = 0;
        int err = 0;

        if (!cbm_validate(cbmvalue)) {
                err = -EINVAL;
                goto out_err;
        }

        par = parent_rdt(ir);
        clos_cbm_table_read(par->closid, &cbm_tmp);
        if (!bitmap_subset(&cbmvalue, &cbm_tmp, MAX_CBM_LENGTH)) {
                err = -EINVAL;
                goto out_err;
        }


Do you (or the author of the patch), can explain why is this
restriction here?

If the restriction has to be maintained, than one 
hierarchy per-socket will be necessary to support different
bitmaps per socket.

If the restriction can be removed, then non hierarchical support
could look like:

/cgroups/cat/group-A/tasks = $thread-A
/cgroups/cat/group-A/socket-1/bitmap = 0x3F / 0xFF
/cgroups/cat/group-A/socket-2/bitmap = 0x... / 0xFF

Or one l3_cbm file containing one mask per socket,
separated by commas, similar to 

/sys/devices/system/node/node0/cpumap

> 2) Remove hierarchical support.

There is nothing hierarchical in CAT, its flat.

Each set of tasks is associated with a number of bits
in each socket's L3 CBM mask.

> 3) Lazy enforcement (which can be done later as an improvement).
> 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Powered by blists - more mailing lists