linux-kernel - Re: [RFD] CAT user space interface revisited

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [thread-next>] [day] [month] [year] [list]
Message-ID: <20151119010535.GA30233@amt.cnet>
Date:	Wed, 18 Nov 2015 23:05:36 -0200
From:	Marcelo Tosatti <mtosatti@...hat.com>
To:	Thomas Gleixner <tglx@...utronix.de>
Cc:	LKML <linux-kernel@...r.kernel.org>,
	Peter Zijlstra <peterz@...radead.org>, x86@...nel.org,
	Luiz Capitulino <lcapitulino@...hat.com>,
	Vikas Shivappa <vikas.shivappa@...el.com>,
	Tejun Heo <tj@...nel.org>, Yu Fenghua <fenghua.yu@...el.com>
Subject: Re: [RFD] CAT user space interface revisited

On Wed, Nov 18, 2015 at 10:01:53PM -0200, Marcelo Tosatti wrote:
> On Wed, Nov 18, 2015 at 07:25:03PM +0100, Thomas Gleixner wrote:
> > Folks!
> > 
> > After rereading the mail flood on CAT and staring into the SDM for a
> > while, I think we all should sit back and look at it from scratch
> > again w/o our preconceptions - I certainly had to put my own away.
> > 
> > Let's look at the properties of CAT again:
> > 
> >    - It's a per socket facility
> > 
> >    - CAT slots can be associated to external hardware. This
> >      association is per socket as well, so different sockets can have
> >      different behaviour. I missed that detail when staring the first
> >      time, thanks for the pointer!
> > 
> >    - The association ifself is per cpu. The COS selection happens on a
> >      CPU while the set of masks which are selected via COS are shared
> >      by all CPUs on a socket.
> > 
> > There are restrictions which CAT imposes in terms of configurability:
> > 
> >    - The bits which select a cache partition need to be consecutive
> > 
> >    - The number of possible cache association masks is limited
> > 
> > Let's look at the configurations (CDP omitted and size restricted)
> > 
> > Default:   1 1 1 1 1 1 1 1
> > 	   1 1 1 1 1 1 1 1
> > 	   1 1 1 1 1 1 1 1
> > 	   1 1 1 1 1 1 1 1
> > 
> > Shared:	   1 1 1 1 1 1 1 1
> > 	   0 0 1 1 1 1 1 1
> > 	   0 0 0 0 1 1 1 1
> > 	   0 0 0 0 0 0 1 1
> > 
> > Isolated:  1 1 1 1 0 0 0 0
> > 	   0 0 0 0 1 1 0 0
> > 	   0 0 0 0 0 0 1 0
> > 	   0 0 0 0 0 0 0 1
> > 
> > Or any combination thereof. Surely some combinations will not make any
> > sense, but we really should not make any restrictions on the stupidity
> > of a sysadmin. The worst outcome might be L3 disabled for everything,
> > so what?
> > 
> > Now that gets even more convoluted if CDP comes into play and we
> > really need to look at CDP right now. We might end up with something
> > which looks like this:
> > 
> >    	   1 1 1 1 0 0 0 0	Code
> > 	   1 1 1 1 0 0 0 0	Data
> > 	   0 0 0 0 0 0 1 0	Code
> > 	   0 0 0 0 1 1 0 0	Data
> > 	   0 0 0 0 0 0 0 1	Code
> > 	   0 0 0 0 1 1 0 0	Data
> > or 
> > 	   0 0 0 0 0 0 0 1	Code
> > 	   0 0 0 0 1 1 0 0	Data
> > 	   0 0 0 0 0 0 0 1	Code
> > 	   0 0 0 0 0 1 1 0	Data
> > 
> > Let's look at partitioning itself. We have two options:
> > 
> >    1) Per task partitioning
> > 
> >    2) Per CPU partitioning
> > 
> > So far we only talked about #1, but I think that #2 has a value as
> > well. Let me give you a simple example.
> > 
> > Assume that you have isolated a CPU and run your important task on
> > it. You give that task a slice of cache. Now that task needs kernel
> > services which run in kernel threads on that CPU. We really don't want
> > to (and cannot) hunt down random kernel threads (think cpu bound
> > worker threads, softirq threads ....) and give them another slice of
> > cache. What we really want is:
> > 
> >     	 1 1 1 1 0 0 0 0    <- Default cache
> > 	 0 0 0 0 1 1 1 0    <- Cache for important task
> > 	 0 0 0 0 0 0 0 1    <- Cache for CPU of important task
> > 
> > It would even be sufficient for particular use cases to just associate
> > a piece of cache to a given CPU and do not bother with tasks at all.
> > 
> > We really need to make this as configurable as possible from userspace
> > without imposing random restrictions to it. I played around with it on
> > my new intel toy and the restriction to 16 COS ids (that's 8 with CDP
> > enabled) makes it really useless if we force the ids to have the same
> > meaning on all sockets and restrict it to per task partitioning.
> > 
> > Even if next generation systems will have more COS ids available,
> > there are not going to be enough to have a system wide consistent
> > view unless we have COS ids > nr_cpus.
> > 
> > Aside of that I don't think that a system wide consistent view is
> > useful at all.
> > 
> >  - If a task migrates between sockets, it's going to suffer anyway.
> >    Real sensitive applications will simply pin tasks on a socket to
> >    avoid that in the first place. If we make the whole thing
> >    configurable enough then the sysadmin can set it up to support
> >    even the nonsensical case of identical cache partitions on all
> >    sockets and let tasks use the corresponding partitions when
> >    migrating.
> > 
> >  - The number of cache slices is going to be limited no matter what,
> >    so one still has to come up with a sensible partitioning scheme.
> > 
> >  - Even if we have enough cos ids the system wide view will not make
> >    the configuration problem any simpler as it remains per socket.
> > 
> > It's hard. Policies are hard by definition, but this one is harder
> > than most other policies due to the inherent limitations.
> > 
> > So now to the interface part. Unfortunately we need to expose this
> > very close to the hardware implementation as there are really no
> > abstractions which allow us to express the various bitmap
> > combinations. Any abstraction I tried to come up with renders that
> > thing completely useless.
> 
> No you don't.

Actually, there is a point that is useful: you might want the important
application to share the L3 portion with HW (that HW DMAs into), and
have only the application and the HW use that region.

So its a good point that controlling the exact position of the reservation 
is important.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@...r.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/