linux-kernel - Re: [PATCH 00/10] x86/CPU and x86/resctrl: Support pseudo-lock regions spanning L2 and L3 cache

lists.openwall.net		lists / announce owl-users owl-dev john-users john-dev passwdqc-users yescrypt popa3d-users / oss-security kernel-hardening musl sabotage tlsify passwords / crypt-dev xvendor / Bugtraq Full-Disclosure linux-kernel linux-netdev linux-ext4 linux-hardening linux-cve-announce PHC
Open Source and information security mailing list archives
Hash Suite: Windows password security audit tool. GUI, reports in PDF.
[<prev] [next>] [<thread-prev] [day] [month] [year] [list]
Message-ID: <a9be6561-548b-d384-d877-a3e031013710@intel.com>
Date:   Thu, 27 Jun 2019 10:55:39 -0700
From:   Reinette Chatre <reinette.chatre@...el.com>
To:     David Laight <David.Laight@...LAB.COM>,
        "tglx@...utronix.de" <tglx@...utronix.de>,
        "fenghua.yu@...el.com" <fenghua.yu@...el.com>,
        "bp@...en8.de" <bp@...en8.de>,
        "tony.luck@...el.com" <tony.luck@...el.com>
Cc:     "mingo@...hat.com" <mingo@...hat.com>,
        "hpa@...or.com" <hpa@...or.com>, "x86@...nel.org" <x86@...nel.org>,
        "linux-kernel@...r.kernel.org" <linux-kernel@...r.kernel.org>
Subject: Re: [PATCH 00/10] x86/CPU and x86/resctrl: Support pseudo-lock
 regions spanning L2 and L3 cache

Hi David,

On 6/27/2019 2:12 AM, David Laight wrote:
> From: Reinette Chatre
>> Sent: 26 June 2019 18:49
>>
>> Cache pseudo-locking involves preloading a region of physical memory into a
>> reserved portion of cache that no task or CPU can subsequently fill into and
>> from that point on will only serve cache hits. At this time it is only
>> possible to create cache pseudo-locked regions in either L2 or L3 cache,
>> supporting systems that support either L2 Cache Allocation Technology (CAT)
>> or L3 CAT because CAT is the mechanism used to manage reservations of cache
>> portions.
> 
> While this is a 'nice' hardware feature for some kinds of embedded systems
> I don't see how it can be sensibly used inside a Linux kernel.

Cache pseudo-locking is an existing (obviously not well known) feature
in Linux kernel since v4.19.

> There are an awful lot of places where things can go horribly wrong.

The worse thing that can go wrong is that the memory is evicted from the
pseudo-locked region and when it is accessed again it will have to share
cache with all other memory sharing the same class of service it is
accessed under. The consequence is lower latency when accessing this
high priority memory and reduced cache availability due to the orphaned
ways used for the pseudo-locked region.

This worse case could happen when the task runs on a CPU that is not
associated with the cache on which its memory is pseudo-locked, so the
application is expected to be associated only to CPUs associated with
the correct cache. This is familiar to high priority applications.

Other ways in which memory could be evicted are addressed below as part
of your detailed concerns.

> I can imagine:
> - Multiple requests to lock regions that end up trying to use the same
>   set-associative cache lines leaving none for normal operation.

I think that you are comparing this to cache coloring perhaps? Cache
pseudo-locking builds on CAT that is a way-based cache allocation
mechanism. It is impossible to use all cache ways for pseudo-locking
since the default resource group cannot be used for pseudo-locking and
resource groups will always have cache available to them (specifically:
an all zero capacity bitmask (CBM) is illegal on Intel hardware to which
this feature is specific).

> - Excessive cache line bouncing because fewer lines are available.

This is not specific to cache pseudo-locking. With cache allocation
technology (CAT), on which cache pseudo-locking is built, the system
administrator can partition the cache into portions and assign
tasks/CPUs to these different portions to manage interference between
the different tasks/CPUs.

You are right that fewer cache lines would be available to different
tasks/CPUs. By reducing the number of cache lines available to specific
classes of service and managing overlap between these different classes
of service the system administrator is able to manage interference
between different classes of tasks or even CPUs.

> - The effect of cache invalidate requests for the locked addresses.

This is correct and documented in Documentation/x86/resctrl_ui.rst

<snip>
Cache pseudo-locking increases the probability that data will remain
in the cache via carefully configuring the CAT feature and controlling
application behavior. There is no guarantee that data is placed in
cache. Instructions like INVD, WBINVD, CLFLUSH, etc. can still evict
“locked” data from cache. Power management C-states may shrink or
power off cache. Deeper C-states will automatically be restricted on
pseudo-locked region creation.
<snip>

An application requesting pseudo-locked memory should not CLFLUSH that
memory.

> - I suspect the Linux kernel can do full cache invalidates at certain times.

This is correct. Fortunately Linux kernel is averse to calling WBINVD
during runtime and not many instances remain. A previous attempt at
handling these found only two direct invocations of WBINVD, neither of
which were likely to be used on a cache pseudo-lock system. During that
discussion it was proposed that instead of needing to handle these, we
should just be getting rid of WBINVD but such a system wide change was
too daunting for me at that time. For reference, please see:
http://lkml.kernel.org/r/alpine.DEB.2.21.1808031343020.1745@nanos.tec.linutronix.de

> 
> You've not given a use case.
> 

I think you may be asking for a use case of the original cache
pseudo-locking feature, not a use case for the additional support
contained in this series? Primary usages right now for cache
pseudo-locking are industrial PLCs/automation and high-frequency
trading/financial enterprise systems, but anything with relatively small
repeating data structures should see benefit.

Reinette